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tion and Maximization (EM) algorithm. The model is estimated on the National 
Longitudinal Survey of Youth 1997. The estimates suggest that both direct and 
indirect learning play an important role in early career wage growth, with those 
with the lowest levels of education achieving the largest increases. 


JEL codes: J24, J31, J62 

Keywords: Occupational choice, correlated learning, dynamic discrete choice, 
expectation and maximization algorithm. 


Jonathan James is at the Federal Reserve Bank of Cleveland. He can be reached 
atjwj8@duke.edu. He is grateful to Peter Arcidiacono, Joe Hotz, and Pat Bayer 
for providing advice, comments, and encouragement. He extends special thanks 
to seminar participants at Duke University, UNC-Chapel Hill, the University of 
Pittsburgh, and the Federal Reserve Bank of Cleveland for helpful comments and 
feedback. 






1 Introduction 


A salient feature of the labor market is the high degree of occupational mobility of younger workers. 
Kambourov and Manovskii (2008) find that male workers, ages 23-28 with at most a high school 
degree have a 30% chance of changing occupations in a given year, even when occupation is defined 
using a very broad level of aggregation (the census 1-digit level). The leading models that address 
this feature of the labor market are the occupational matching models of Miller (1984), Neal 
(1999), and Pavan (2009). These models, along with the job shopping literature of Johnson (1978), 
Jovanovic (1979), and Topel and Ward (1992), argue that this high frequency of occupational 
change is the result of an experimentation and learning process that younger workers undergo to 
search across different types of employment to find their comparative advantage. 

These matching models have provided a powerful framework for understanding early career 
occupational mobility and wage growth. However, with few exceptions (e.g. Shaw (1987)), this 
literature has made the strong assumption that the nature of occupational match is exclusively 
occupation specific. This independence assumption implies that as an individual learns about their 
ability in one occupation, this experience provides no additional information about their ability in 
any other occupation. While this simplifying assumption eases the computational burden of these 
models, it unfortunately severely limits how much we can learn about occupational sorting and 
mobility and it’s implications for worker welfare. 1 

This paper develops and estimates an occupational matching model that relaxes this assumption 
of independence. Namely, occupational matches will share a robust correlation structure, such that 
as a worker learns about their ability in one occupation, it will be informative about their abilities 
in all occupations. Relaxing this assumption fundamentally changes the worker’s problem, greatly 
broadening the model’s usefulness and our understanding of why workers change occupations. 

The focus of the previous literature has been on occupational mobility due to learning about a 

1 Learning models outside of labor economics also have assumed independence. For example, Crawford and Shum 
(2005) evaluate a pharmaceutical matching model in which patients are uncertain about their innate responsiveness 
to Anti-ulcer medications. The patients experience with their current medication does not affect their choice of 
subsequent medications. Likewise, Ackerberg (2003) assumes independence in an advertising and learning model over 
yogurt products. Allowing for correlation in his model modifies the consumers problem in a number of interesting 
ways. 
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poor match. An important contribution of the paper is that by relaxing the independence assump¬ 
tion, we are able to generalize the role of learning to account for many new patterns of mobility that 
we observe in the data. Specifically, the model will be able to additionally rationalize 1) occupa¬ 
tional changes due to promotion, where high ability workers change occupations precisely because 
they are high ability, 2) occupational clustering, where workers may tend to switch occupations 
between a handful of apparently similar occupations (e.g. between business operations occupations 
and professional occupations), and 3) occupational cycling, where workers leave an occupation, 
only to return a few periods later. 2 In more general terms, the correlated learning modifies the 
workers problem in an ingenious way so that learning now has an important two dimensional role 
of affecting the decision to change occupations as well as driving the decision of which occupation 
to go to next. 3 Generalizing the learning framework to account for these other mobility patterns is 
an important step for understanding more thoroughly how workers substitute across different types 
of employment. 

The second contribution of the paper is that it extends the occupational matching literature by 
simultaneously endogenizing the education decision along with occupational choice and learning. 
The previous literature abstracts from the education decision and estimates models on individuals 
with homogenous education. Miller (1984) argues that the data indicates that college graduates 
appear to make more informed decisions about occupational matches than high school graduates. 
The ability matching model allows individuals to have heterogeneous ability for schooling. In an 
intuitive way, through the occupational correlation structure, the model allows schooling ability to 
be informative about prospective occupational matches. Endogenizing education in this manner will 
provide important insight into the relationship between educational attainment, the occupational 
learning process, and the source of information differentials among education groups. 

The final contribution of the paper is methodological. Allowing occupation specific ability 
to be correlated across occupations comes at a huge computational cost. First, the number of 

2 Kambourov and Manovskii (2008) show that about 30% of occupational changers actually return to their previous 
1-digit occupation within four years. 

3 The independence assumption allows workers to leave occupations because of poor matches, but it does not 
explain how they choose new occupations because independence means workers view all occupations as ex-ante 
identical. 


3 



parameters are increasing in the number of occupations, with which traditional approaches require 
jointly maximizing a large set of parameters. Second, since abilities are unobserved, we are required 
to integrate this high dimensional correlated unobservable out of the data. Finally, the size of the 
state space is overwhelming, containing all potential beliefs about occupational abilities, such that 
solving the dynamic model is extremely difficult. 

These three features independently are challenging but combined make traditional methods 
computationally infeasible. I address these challenges by estimating the model using the expec¬ 
tation and maximization (EM) algorithm. The EM algorithm greatly simplifies the problem by 
allowing many of the model parameters to be estimated separately, avoiding joint maximization. 
Furthermore, it eliminates the need to perform the high dimensional correlated integration and 
replaces it with a set of single dimensional integrals, which conveniently admit a closed form ex¬ 
pression. Perhaps most important, is that by utilizing the EM algorithm, the computational burden 
only increases linearly in the number of occupations (as oppose to exponentially with traditional 
methods). This allows the empirical model to accommodate a large number of occupations avoiding 
coarsely aggregating occupations into a smaller set. 4 Finally, as shown in Arcidiacono and Miller 
(2010), the EM algorithm provides a means to recover consistent estimates of conditional choice 
probability (CCP’s) with unobserved state variables, which can be used to tractably estimate the 
dynamic structural choice model. 

I estimate the model using the 1997 cohort of the National Longitudinal Survey of Youth 
(NLSY97). The empirical model allows for a large choice set where individuals choose among ten 
occupations. The NLSY97 data is invaluable for a model that requires a richer detail of occupations. 
A highly documented issue with its counterparts is the prevalence of spurious occupational changes. 
The data issues are largely due to how the data is gathered. The occupation instruments in the 
NLSY97 data are collected in such a way to reasonably avoid these measurement issues. 

4 Avoiding coarse aggregation of occupations (e.g. blue collar and white collar) is highly desirable in the context of 
the learning model. Aggregation assumes that human capital (i.e. schooling, experience, and occupational ability) is 
perfectly transferable among the grouped occupations, which if untrue will bias many of the primitives of the learning 
model. For example, if an individual learns they have a bad match as a restaurant waiter and changes careers to 
become a welder, which they learn they have a high ability in, then when these occupations are aggregated to blue 
collar, the model will interpret these wage increases as returns to tenure when they are actually returns to search. A 
more thorough discussion of misspecification bias is presented in appendix A. 
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The estimates on the distribution of abilities suggests that workers can potentially achieve large 
wage increases by finding occupations they are well suited for. Furthermore, the estimates of the 
covariance implies that correlation is important to the learning process. In addition, the correlation 
between schooling ability and the occupational abilities suggest that as workers learn about their 
schooling ability, this generates very different beliefs about expected abilities across occupations. 

Using the parameter estimates, I simulate careers to further investigate the role of sorting on 
ability by educational group and it’s influence on wage growth. The simulations show that high 
school dropouts and high school graduates are more likely to change occupations and more likely 
to sort on ability than college graduates. Consequently, workers without a college degree achieve 
the greatest increases in wages due to sorting, where within the initial years of their careers (8-9 
years labor market experience), sorting on ability attributes to an increase of wages of 9.5% for high 
school drops and 7.5% for high school graduates, compared to 5.2% for college graduates. Previous 
work (e.g. Miller (1984)) concluded that more informed career decisions resulted in the infrequent 
occupational mobility of college graduates. However, the results from the simulations suggest 
a different interpretation, which is that college graduates are less likely to change occupations 
because sorting on ability is less important to these workers and occupational changes are driven 
by other factors. For example, a college graduate may be more likely to stay in a poorly matched 
occupation because the returns to their college degree are higher in that occupation. 

A number of recent papers have aimed at allowing for correlated learning in empirical work. 
Antonovics and Golan (2011); Sanders (2010) allow for correlation in occupational learning models. 
There method utilizes information in the Dictionary of Occupational Titles to construct a measure of 
manual and cognitive skills for each occupation following Yamaguchi (2010). The correlation matrix 
is constructed from these two measured factors and learning is reduced to these two dimensions. 
The empirical approach in this paper is to have each occupation represent it’s own factor, where the 
correlated learning is over J factors (the number of occupations in the economy). In addition, the 
correlation among these J factors is estimated rather than drawn from auxiliary data. Dickstein 
(2011) uses a different approach to allow for correlated learning among anti-depressant medications. 
He incorporates correlation into the dynamic allocation index rule used in Miller (1984). One 
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drawback of this approach is that any additional factors (observed or unobserved) cannot affect 
the dynamic decision process because the optimal choice is based solely on learning parameters, 
abstracting from anything else in the utility function (in particular any unobserved random utility 
shocks). 

The outline of the paper is as follows: In section 2, I discuss the details of the occupational 
choice model with ability matching. Section 3 outlines the two stage estimation strategy using 
the EM algorithm that uncovers the structural parameters. Section 4 will discuss the population 
extracted from the NLSY97 data. Section 5 reveals the empirical results. Section 6 simulates data 
from the parameter estimates to investigate the relationship between sorting on ability and wage 
growth with an emphasis on it’s link to educational attainment. Finally section 7 concludes. 

2 Ability Matching Model of Occupation Choice 

This section outlines the dynamic career decision model. The essential feature of the model is 
that workers vary in their innate skills and abilities for each occupation in the economy and most 
importantly, these unique occupation specific abilities are unknown to the worker when they begin 
their career. Workers are able to learn about their occupation specific abilities through direct 
experience. In addition as workers are learning about their ability in one occupation, they will be 
able to make inferences about their abilities in all occupations by exploiting a known correlation 
structure between occupational abilities. This section first describes the occupational ability match 
vector and how it relates to wages, as well as the learning process. Then, it details the dynamic 
occupational choice model. 

2.1 Ability Matches 

The economy consists of J distinct (but potentially related) occupations. Workers have an innate 
ability for each of the J occupations. We can uniquely characterize an individual by their vector 
of ability endowments, 


■M-i — [ fin , fii2 , ‘ ‘ ‘ , fiij , fiis ] 


(1) 
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where /i, i represents individual i' s ability in occupation 1, with /i *2 being their ability in occupation 
2, etc. Two important features of the ability endowment drive career decisions. First, the ability 
vector is initially unknown to the individual, and second, the components of the ability endowment 
are not independent. 

The final term in the ability vector, m s , represents individual i’s ability or preference for school. 
Similar to Belzil and Hansen (2002); Keane and Wolpin (1997) this is a heterogenous term which 
affects the utility for attending school in the form of additional cost. 5 This component will play a 
prominent role in the workers problem because, like the occupational abilities, school ability is not 
independent. 

The ability match vector is distributed through the population multivariate normal, parame¬ 
terized as, 


Mi ~A7(E (Mi),V(Mi)) 

( 

M 


(2) 


■p 

1 (Jxl) 


^(Jxj) 

p 

Ts(lxl) 

5 

(J+lxl) 

^•t(Jx 1) 

r2 

°S(lxl) 


(J+lxJ+1) 


where T is a vector containing the mean of occupational abilities in the population. j s represents 
the mean of schooling ability. The covariance is the partition of the three terms, A, <5^, and 5 s j. A 
is the occupation ability covariance matrix. It’s dimension is JxJ. is the variance of schooling 
ability, and 5 s j is the covariance between schooling ability and each of the occupational abilities. 

If workers have rational expectations, this covariance structure is going to play a critical role 
in the workers learning process. Specifically, workers will not only learn about their ability in the 
occupations they have direct experience in, but they will also be able to exploit this covariance 
structure to make inferences about their abilities in other occupations. Likewise, if schooling ability 
is correlated with any of the occupational abilities, as workers learn about their schooling ability, 
this will also provide valuable information about their abilities in the various occupations. 

The covariance structure is an important element in the model because it’s sparseness (dense- 

s The model is agnostic about what this term actually represents, however for consistency it will be referred to as 
schooling ability (where it could also be referred to as schooling match, schooling preference, or schooling cost). 
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ness) represents the importance of comparative (absolute) advantage. In one case, if abilities are 
highly correlated, then efficient search becomes less important since the individual’s problem ef¬ 
fectively reduces to a single dimension. Any path of experience will effectively yield the same 
information, where they are discovering whether they are a high type or a low type. In the other 
extreme case, if abilities are only loosely correlated, then efficient search becomes a critical part of 
the career process as workers seek to find their comparative advantage. In this case, the experi¬ 
ence path matters. As workers gain labor market experience, it becomes more costly for them to 
experiment with new careers because this likely results in the loss of accumulated human capital 
in their current occupation. The fact that searching later in the career is more costly places more 
importance on strategically finding a good occupational match early in the career. 

Going forward, it’s assumed that when individuals begin making decisions in the model, their 
schooling ability is known. 6 Knowing their schooling ability, , workers can use the distributional 
parameters to form updated beliefs about their occupational abilities. Letting {Eit(-Adj), Vjt(TWi)} 
indicate the mean and variance, which completely characterizes their beliefs about their ability 
vector given information up to date t, then their beliefs at (t = 1) are described by, 


{Ei! (Mi), Vii {Mi)} = {E(Mi),V(Mi)\ms} 


< M 

[ 

r 



7s 


A 

S'sJ 

5 s j 

5 ° . 



= {r + - 7,), A - 


(3) 


where the last line of equation (3) is the conditional distribution of a multivariate normal, whose 
resulting distribution is also multivariate normal. 

This generates an important degree of heterogeneity across individuals in their beliefs about 
their occupation specific abilities. Likewise, it creates an important link between education decisions 
and occupational search patterns. 

6 This may be due to the fact that they either actually know this value, or they have learned it through a sufficient 
number of years of school. 



2.2 Wages and Learning 


Workers possess different types of accumulated human capital (e.g. education and work experi¬ 
ence). Each occupation utilizes these forms of human capital in a unique way, so the individual’s 
productivity will vary by occupation. In a similar way, their occupation specific ability will also 
affect productivity. Following the literature on productivity wages, if a worker is employed in oc¬ 
cupation j e J, they are paid their occupation specific marginal product with log-wage described 
by- 


w(hu • hij , hijt ) — hitOj + [ijj TJijt (4) 

where i subscripts the individual, t the time period, and j the occupation. The term, ha, rep¬ 
resents the individual’s vector of accumulated human capital, including experience in the various 
occupations and education at date t. 6j is the occupation specific return for the different types of 
accumulated human capital. Their occupation specific ability is represented by ji VJ . The term rjijt 
is an individual, occupation specific, transitory technology shock to wages. 

Since occupational ability is additive in the log-wage equation, it has a very clear interpretation 
as a percentage difference in wages, such that an individual with a value of jiij =0.1 enjoys a 10% 
higher wage in occupation j, all else equal, while an individual with /i tJ = —0.1 has 10% lower 
wages. 

When an individual works in occupation j, they observe their wage. Following Miller (1984) 
and Jovanovic (1979), what they do not observe is the individual values of the random variables, 
Hij and rjijt■ They can only observe their sum by differencing {w t jt — huOj) = (Hij + rjijt). Workers 
are extremely interested in knowing their occupational ability because it is a potentially large (or 
negative), persistent component affecting future returns in that occupation. Given this noisy signal, 
individuals make inferences about their abilities via Bayes’ rule, using their priors and knowledge 
of the distribution of the technology shock, 

The technology shock rjijt represents the noise of the ability signal. This random variable has 
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an occupation specific variance that is normally distributed and i.i.d. across time and individuals. 


Vijt ~ AT(0, af) V j <E {1,2, ••• , J} 


Occupations differ in the variance of the technology shock, implying that productivity in some 
occupations may be more or less influenced by transitory factors. In occupations with low aj. the 
individual will be able to infer their occupation ability match faster than occupations with a high 
transitory noise. Taken to the extreme, if cr^ = 0, then the worker will be able to infer their ability 
from a single wage observation. From the worker’s perspective, one dimension occupations can be 
classified is along fast learning and slow learning occupations, which is dictated by aj. 

A second dimension in which occupations differ is how informative their abilities are about the 
abilities in other occupations. Workers distinguish between high information occupations and low 
information occupations. High information occupations are not only informative about the own 
occupation ability but also provide information about abilities across occupations. Low information 
occupations are those that only provide information about the own occupation ability. 

In the model, workers only observe their productivity (wage) after they choose an occupation.' 
This means that when workers make career decisions they choose occupations based on expected 
wages, taking into account what they have learned so far in their career. The details of the decision 
process are outlined in section 2.3, the emphasis here is to highlight that workers are precluded from 
receiving multiple wage signals simultaneously and must incur the cost of accepting a job in order 
to acquire information. If this were not the case, workers observe all of their productivities for each 
occupation in each period, in which learning would simply be a product of time and individuals 
require no strategy to efficiently seek out information. 

When a worker observes an ability signal they update their beliefs recursively using Bayes’s rule. 
Their posterior beliefs are a function of their initial beliefs, the noisy signal of their ability , and the 
known distribution of the technology shock corresponding to the occupation they worked. Degroot 
(1970) outlines the updating formulas given a normally distributed prior and normally distributed 

'Other models of occupational choice like Keane and Wolpin (1997); Sullivan (2010) assume individuals observe 
all of their potential wages for each occupation in each period prior to making a choice. 
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signal. The updated beliefs can be described in matrix form using the following notation. Let 
da -1 = j indicate that individual i worked in occupation j in period t — 1. Then, 


Cit—ljxi \ 


Cit—lj — Wijt —1 hit-id j 

0 


if (d it -1 = j ) 
otherwise 


1 . 7 V .7 \ 




it—1 7 


2 , if(d it -i=j ) 

j 

0 , otherwise 


C*t-ij xl is a sparse vector that contains all zeros except in the j th position, which has the 
ability signal of occupation j. is a sparse matrix with zeros everywhere except in the (j. j) 

element, which contains the inverse of the occupation j technology shock variance. 

Given this vector and matrix, using the previous periods beliefs about occupation ability, which 
are multivariate normal described by their mean and covariance Vn-i(Aii), the poste¬ 

rior beliefs are multivariate normal with mean and variance defined as, 

E + E«_i] _1 [v i t_i(Ai,) _ 1 E«-i(M i ) + Eit-iC«-i] 

= [Vit-ilA^i ) -1 + Si*_i] _1 

The previous periods beliefs take into account the entire past sequence of outcomes and choices as 
well as their initial priors, therefore it is sufficient to only condition on the previous periods beliefs, 
rather than the entire history. Also, equation (5) demonstrates that if an individual receives no 
signal (does not work) then the update simply returns the prior. 


2.3 A Dynamic Model of Career Decisions with Learning 

The dynamic occupational choice model shares a similar framework to other structural occupational 
choice models like Keane and Wolpin (1997) and Sullivan (2010). The distinguishing difference is 
that individuals now face uncertainty over their occupation specific abilities, and given their forward 
looking behavior, workers seek an optimal career path to efficiently learn about their unknown 
abilities. 

From age 16 (t = 1) until age T, individuals make career decisions to maximize expected life¬ 
time utility. In each period, individuals make a discrete decision du 6 {u, s, 1,2,--- , J} among 
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J + 2 mutually exclusive alternatives, where d = u is unemployment (home production), d = s is 
attending school, and d = j, for each j G {1,2, • • • , J}, is employment in occupation j. Each choice 
is associated with a current period utility Ud(-) and a non-pecuniary random utility shock e^, which 
is additive and i.i.d. across choices and time, so the per-period pay-off for choice d is, Ud(-) + £d 

2.3.1 Employment: d € {1,2, • ■ ■ , J} 

The per-period utility associated with employment in occupation j is, 


Uijt(-) = oij i + a j2 (dit- 1 / j)+a j3 (Exprj = 0) 

+ a u M^u(exp(wij t )) hit) 
Vj € {1,2,-•• , J} 

where, u(exp(w)) =-exp(it;)( 1-p ) ,p G [0, oo) 

1 ~ P 


( 6 ) 


This expression contains a constant, ayi, which represents the non-pecuniary benefits of being 
employed in occupation j. An entry cost, a ]2 j if the worker did not work in occupation j in the 
previous period, and aj 3 , an additional entry cost if the worker has never worked in occupation j. 
Each parameter is occupation specific except for the parameter on expected utility of wages, a w , 
which is the same regardless of the source of the wage. 8 

An important departure from previous work is that workers are not assumed to be maximizing 
expected lifetime wages. Rather, because workers face a high degree of uncertainty in this model, 
it is likely that risk plays an important role in career decisions. Assuming workers cannot save, 
they put all of their wages into consumption. A risk averse worker may be unwilling to try their 
hand in a new occupation in which their is a good chance they may receive a very low wage and 
thus lower consumption. The iso-elastic utility specification in equation (6) allows risk over wages 
to influence workers’ decisions. Risk aversion is measured by the constant relative risk aversion 
parameter p. If p = 0, then workers are expected wage maximizers. 

8 Recall that equation (6) has exp(io) as the argument because w represents log wage. 
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Following the expression from equation (4), log wages are given by, 


w(ha > dij i Hijt ) — hitOj + fiij rjijt (7) 

=0j\ + 0j2(ed) + djs(ed 2 / 100 ) + Oj^Exprj > 0 ) 

+ djs(Exprj) + dj 6 (Expr 2 / 100) + 0 j7 (^ Expr k ) + pij + rj ijt 

-j 

where ed is the number of years of education, Exprj is the total labor market experience in occu¬ 
pation j, and Expr k is the sum of other labor market experience excluding j. 

-j 

Deriving the expression for the expected utility of wages for occupation j using the expression 
in equation (4) yields, 


E(u(exp(u;^))| 

hit) E it ,V it 

1 

,1-/9 
1 


=E 

=E 


ex' 


P 


-p) 


hu j E n , V i 


1-/9 


exp ((1 - p)(h it 6j + pij + 


E it,~Vit 
2 


— exp ^(1 — p)(hit0j + E it(p-ij)) + -—^ (Vit(pij) + 


1 - p 


( 8 ) 


where the last line follows from the fact that the sum of the two unknown random components, 
the occupation j ability, pij, and the transitory productivity shock, rjijt , ar e normally distributed, 
making wages distributed log-normal. The term E n.(pij) in equation (8) represents the individual’s 
marginal expectation about their ability in occupation j given all of the available information at 
time t and likewise Vit(p-ij) represents the marginal variance. 

Contemporaneously, risk aversion is an additional search friction similar to the other entry costs. 
However, risk aversion may also play a role intertemporally as occupations deliver very different 
consumption streams through the returns to accumulated human capital. If workers are risk averse, 
they may find occupations with initially steeper returns to tenure more attractive than those with 
returns that have a more gradual profile. 
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2.3.2 Schooling and Unemployment: d £ {s, u} 


If the individual chooses schooling in period t they receive flow utility. 

Ui S t(-) =a s i + a S 2 (AttCOL ) + a s3 (ReentryH S) 

+ a s4 (ReentryCOL) + a s5 yi s 

Schooling preference depends on ( AttCOL ) which indicates whether the individual is attending 
college, ( ReentryHS ) and ( ReentryCOL ) represents the re-entry cost if the individual was not 
enrolled in school in the previous period. Finally, the utility of schooling will be affected by school 
ability y is . 

The mean flow utility of unemployment is normalized to zero (i.e. Ui u t = 0). 

2.3.3 Value Functions 

Let Sn represent individual i’s state vector at time t. This includes all of the variables relevant to 
the individuals decision, which given the current specification is years of education, labor market 
experience for each occupation j, the previous periods decision, du~ i, and their beliefs about their 
occupational ability, E u(Mi) and 'V l t(A4 l ). Furthermore, let Sa be the J+2 current period random 
utility shocks, £ u ,s,i,2,- ,j 

Workers seek to maximize the expected discounted sum of future utility. Let Vt(Sit,£it) be 
the value function for a worker in a particular state at time t when they make optimal decisions. 
Given that time is finite, Vt+ i(-) = 0. With discount of future utility /3, we can represent the value 
function with the following recursive Bellman equation, 

V t (S it ,£it) — max + £ c + /3IEI / f_|_i(5jf-|_i, £it-\-i\Sit, dit — c)} (9) 

c£{u,s 

,W} 

The expectation is taken over next periods utility shocks, e, as well as all possible realization of the 
occupational ability signals, if they chooses one of the employment options, which effects their 
beliefs in the next period. The other state variables, education and labor market experience evolve 
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deterministically, but are endogenous to the choice. 

Letting v c t(-) represent the conditional value function, which is the mean pay-off for each choice, 
excluding the current periods error term defined as, 

VctiSit) — udSu) , dit — c) 

Vc G {u, s, 1,2, • • • , J} (10) 

then the individual’s optimal choice in period t, d* t is determined by. 

d* t = argmax {v ct {Su) + e ict } (11) 

cS{u,s,l,-" ,J} 

Dynamics play an important role in the individual’s decision process. Since the information 
the individual receives is conditional on the choice, individuals strategically make career decisions 
which are likely to give them the best information. In Miller (1984) individuals have an incentive 
early on to try out occupations where the ability match distribution has a low mean, but high 
variance because this could potentially lead to very valuable information if the individual’s talent 
happens to be in the upper tail of the distribution. 9 Consequently, embedded in this maximization 
problem is an optimal search strategy in which individuals strategically search over occupations to 
efficiently learn about their innate abilities. 


3 Estimation 


The ability matching model possess a number of features that make estimation challenging. These 
problems are rooted in the fact that many components in the model grow exponentially complex 
in the number of occupations. A popular method seen in the literature to address this curse of 

9 Most interesting is that relaxing the independence assumption actually generates a potentially new set of optimal 
career paths than found in the previous literature. Consider a worker choosing between two occupations. The 
distribution of the unknown match value for occupation A is characterized by a high variance and low mean, while 
the distribution for occupation B has a low variance and high mean. The optimal career path described in Miller 
(1984) is for workers to always initially sort into occupation A, the high variance occupation, with the goal being that 
this is the occupation with the greatest informational benefit. However, if the occupational matches are correlated, 
then the worker’s optimal career path may be to initially sort into occupation B and through the correlation structure 
learn about their match value in occupation A. 
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dimensionality is to reduce the number of occupations by coarsely aggregating occupations into a 
smaller set. Unfortunately, in the context of the learning model this is a particularly dangerous 
approach because there is a serious risk of misspecification, which will bias nearly all of the model 
parameters. 10 

In order to estimate the model allowing for a rich set of occupations, we require a new approach 
which explicitly addresses the curse of dimensionality. The curse of dimensionality presents itself in 
a number of ways. First, given that we do not observe the high dimensional vector of occupational 
and schooling ability, we need to integrate over these correlated unobservables. With many occu¬ 
pations this high-dimensional integration becomes increasingly difficult. The second computational 
problem stems from the fact that nearly all of the parameters in the model are occupation specific. 
With many occupations, this corresponds to many parameters. The burden of jointly estimating 
all of these parameters simultaneously poses a major computational challenge, in particular when 
using numerical gradient based methods. Finally, the state space in the model is continuous along 
multiple dimensions as it includes all potential combinations of beliefs about occupational abilities. 
Even if beliefs where the only state variable and for each occupation, beliefs were crudely gridded 
to 10 points, then with 10 occupations the state space would contain over 10 billion combinations 
of beliefs. Traditional full solution methods for solving dynamic discrete choice problems will not 
be feasible. 

The estimation strategy utilizes the Expectation and Maximization (EM) algorithm to over¬ 
come these computational challenges. This approach breaks the curse of dimensionality so that the 
computational complexity only grows linearly in the number of occupations, allowing many occu¬ 
pations into the empirical model. Compared to traditional maximum likelihood, which attempts 
to solve this extremely complex problem all at once, the EM algorithm can be viewed as a process 
which breaks the larger (infeasible) problem up into a couple of smaller (feasible) problems, in par¬ 
ticular an expectation step and a maximization step. By iterating between these two interlinked 
steps, Dempster et al. (1977) show that the parameters converge to the same solution as numerical 
maximum likelihood. 11 

10 Misspecification bias and it’s consequences are discussed in appendix A. 

11 A local maximum. 
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With respect to the computational challenges outlined above, the EM algorithm breaks the 
J —dimensional correlated integral into J single dimensional integrals. Furthermore the maximiza¬ 
tion becomes additively separable for each of the wage equations and the choice equation, allowing us 
to estimate each set of parameters separately. Finally, as demonstrated by Arcidiacono and Miller 
(2010), the EM algorithm provides a means to recover conditional choice probabilities (CCP’s) 
from the data in the case of unobserved state variables. Hotz and Miller (1993); Arcidiacono and 
Miller (2010) show that CCP’s can be used as an alternative representation of the expected future 
value term, enabling us to solve the complicated dynamic structural model without solving the 
optimization at every point in the state space. 

The estimation is implemented in two stages. In the first state, I use the EM algorithm to 
recover all of the model parameters except the structural utility parameters. Rather than resolving 
the structural model at each iteration of the algorithm, I replace the structural choice probabilities 
with a flexibly specified dynamic reduced form. Then, using the parameter estimates from the first 
stage, I estimate the dynamic structural model in the second stage. This section is broken into 
three sub-sections. The first describes the likelihood function and the challenges associated with 
traditional maximum likelihood. The next sub-section describes the EM algorithm, including the 
first stage of estimation. Finally, using the first stage estimates, I outline the estimation of the 
dynamic structural model. 

Normalizations 

In addition to the standard normalizations associated with estimating discrete choice models 12 , 
some additional parameters of the model are not identified. Specifically, given that the ability 
matches are random variables, the mean of occupation ability will not be separately identified from 
the constant in the wage equation and the mean of schooling ability will not be separately identified 
from the constant in the schooling utility. Therefore, the mean of the ability vectors, T and are 
set to zero. In addition, since schooling match enters the utility for schooling as, a S 5 Hi s , the scale 
of schooling ability, E(/x 2 s ), will not be identified separately from the coefficient, ct S 5 . Therefore, I 

1J Only differences in utility functions are identified, and the variance of the random utility shock is not identified 
separately from the scale of the utility parameters 
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normalize 8 2 = 1. 


3.1 The Likelihood Function 

The likelihood function for individual i is characterized as the joint probability of the observed 
career decisions dn £ {u, s, 1, 2, • • • , J} and realized wages, wu, for t =1 to T, taking expectations 
of the likelihood over the unobserved schooling and occupational ability, {pn , /j, 2 , • • • , / Uij,Hi s }. 

Pr(dit\xit,bit,jls,A,Q) (12) 

J 1 

x Pi(w it \xit, ftj , Q) dit=j 

3 =1 

H(jh, ■ ■ • , £j> As|A,<5 s j) 


A = 



Ms •'Ml 


'MJ 


T , 

n 

t=i' 


In equation (12), xa denotes the observed data, which effects choices and wages, bn represents 
the unobserved beliefs the individual has about their occupational ability. The structural param¬ 
eters are summarized by A £ {«i, • • • , aj, a w , p, a s }) and the wage parameters are summarized by 
0 £ {0 1 , • • • , Oj, (Ti, • • • , crj}. The wage parameters enter the choice probabilities because expected 
wages are a component of the employment flow utilities. 

With expressions for the probability of the observed choices and wages, we need to integrate 
out of the likelihood the unobserved ability vector, whose density is represented by H(-). Given the 
normality assumptions in the model and the normalizations stated previously, H( ) is parameterized 
as, 




, fij, fis\A,8 s j) ~ A f 


0, 


A 

$sj 

8 s j 

1 


(13) 


A striking feature of the model is that the true occupational abilities do not directly enter 
into the probabilities of the observed choices. Instead, it is the beliefs about their occupational 
abilities, bu , which drive choices. Intuitively what this means is that two individuals with identical 
beliefs about their occupational abilities will make the same choices in probability, even though 
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their occupational abilities may be completely different. Therefore, once we condition the choice 
probabilities on the beliefs, we can pull this component outside of the multiple integration over the 
occupational abilities. In rewriting the likelihood, it is then helpful to break the density H(-) into 
the unconditional distribution of schooling, G'(-), and a conditional (on schooling) distribution of 
occupational ability, F(-). Where, 


G{fi a ) ~ AT(0, 1) 

F(jh, • • • , p,j\p, s , A, S sJ ) ~ A f (S s jp, s , A - 6 s jd ' sJ ) 

The unobserved schooling ability plays an important role in the model. Not only does it 
dictate individual’s willingness to attend school, but it also influences their initial expectations 
about their occupational abilities. Rather than assuming this important unobservable is randomly 
distributed across individuals, it is common practice in the literature to condition this unobservable 
on potentially relevant information outside of the model. 13 For estimation, unobserved schooling 
ability is determined by, 


l^is — £^A T 

s.t. E(ni s ) = 0 and E(/tfj - E= 1 


(14) 


Where z* is a vector of observed variables including highest grade completed by the mother and in¬ 
formation about the highest grade completed by the individual at age 16. 14 A are parameters to be 
estimated, and the residual v represents the remaining unobserved components and is distributed 
A/ r (0, al). The equation is estimated with the normalizations as constraints. If the observed vari¬ 
ables z % have no relation to fii s then all of the variance will be absorbed by the unexplained error, 

13 Keane and Wolpin (1997) allow the initial years of schooling at age 16 to affect the probability of the initial 
conditions. 

"specifically, z % includes the highest grade completed of the mother, an indicator if the highest grade completed 
of the mother is missing, an indicator if the highest grade completed by the individual at age 16 is less than 9, and 
an indicator if the highest grade complete by the individual at age 16 is greater than 9. 
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v , and a v = 1. Therefore, schooling ability is distributed, 


G{fi s \zi, A, al) ~ Af(zi \, ct^) 


( 15 ) 


Pulling the choice probabilities out of the integration of the occupational abilities and replacing 
the joint density H( ) with the conditional density F(-) for occupational abilities, and the density 
G(-) for schooling ability, the likelihood in equation (12) becomes, 


A = 


'Ms 


Pr(da\xit, bn, jl s , A, 0) 


(16) 


t=l 


x / ■■■ / ( 3 

•'Ml •'MJ \f = lj=l , 


,Aj|As,A,Aj) 


G(A|z i; A,cr^) 


There are three issues which make directly maximizing the likelihood in equation (16) com¬ 
putationally infeasible. First is the integration of the high dimensional correlated unobservables. 
Second is the pervasiveness of the parameters. For example, the wage parameters enter into the 
wage probabilities, the choice probabilities, as well as the belief functions. Even more challenging, 
is the fact that the distributional parameters for the integration also enter in the choice probabil¬ 
ities since they are components of the belief functions. Analytical gradients in this environment 
are extremely difficult to calculate. This forces us to rely on numerical gradients for maximization 
with which such a large parameter space is extremely slow. 

The final challenge to directly maximizing equation (16) is that at each guess of the parameters, 
we need to solve the individual’s optimization problem in order to evaluate the probabilities of the 
observed choices. Not only does this require solving for all potential career paths, but it also 
requires solving the value function for all potential realizations of occupational ability. Even using 
non-full solution interpolation methods like Keane and Wolpin (1994), with large T and large J, 
the complexity of the state space with such a large number of continuous variables is overwhelming 
and even performing the backward recursion for one iteration of the maximization is extremely 
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computationally intensive. 


3.2 The EM Algorithm 

In order to estimate the model, I first consider a simplified version of the likelihood in equation 
(16) specified as, 


A = / IJIJ tt c (x it , jis) dii c 

•As L t=i ceC 

[ •" [ ( nn pr Ki^Ae^ 

•Ai Aj \ t= ij = i , 


(17) 


E(/ii, • • • ,/7/lA, A, <5 s j) 


A GAN, A, cr^) 


where, A = {u, s, 1, • • • , J} 


In this version of the likelihood I have replaced the probabilities of the observed choices which 
are rooted in the theoretical model and require solving the individuals optimization problem, with 
a simpler function, Q, that can be viewed as a reduced form representation of the dynamic choice 
probabilities. Rather than maximizing the structural parameters that best fit the data, is a 
flexible specification of the states, such that, 


j^s) — Pi (At — c\Xiti bit, As> A ®) ( 18 ) 

for c € {u, s, 1, 2, • • • , J} 

fl c and it’s parameters have no real meaning other than they characterize the observed choices 
extremely well. This can be achieved with a flexible enough specification of the state variables xu 
and j2 s . It is important to notice that the beliefs nor the wage parameters enter into the function 
O. The reason beliefs are not included in this expression is that beliefs are simply complicated 
(unknown) functions of the observed data (i.e. past wages) and unobserved schooling ability. 
Including enough high order interactions of these variables, we will be able to distinguish how these 
terms influence choices, without having to solve for the beliefs explicitly. Ideally, if the choice 
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probabilities are expressed as a robust function of the states, then including the beliefs will be 
redundant. The same reasoning holds for excluding the wage parameters. Since expected wages 
are just unknown functions of xn, it’s relevance in the choice probability should be accounted for 
with a flexible enough specification of xu in 17. 

The likelihood in equation (17) avoids the previously outlined problem of having to solve the 
dynamic optimization problem at every guess of the parameters. By including 17, this acts as a 
dynamic selection correction term and allows us to consistently estimate the other parameters in 
the model. However, given the other challenges outlined (high dimensional integration, joint maxi¬ 
mization of many parameters), maximizing equation (17) directly is impractical, if not infeasible. 15 

The EM algorithm provides a tractable alternative to maximizing equation (17) directly. The 
benefits of the EM algorithm are twofold. First, it allows us to separately estimate the choice 
parameters apart from the wage parameters as well as each of the wage parameters independently. 
This avoids the computational challenge of maximizing all of the parameters simultaneously. Sec¬ 
ondly, it accommodates a simple closed form solution for the integration of the occupational ability 
parameters such that solving for the wage parameters reduces to ordinary least squares. 

The specifics of the EM algorithm are best explained by reframing the likelihood as if all of the 
data was observed. To do this, we define two densities pertaining to the individual’s ability vector. 
First, let 


9i (As) 


1) if As — 9is 

< 

0, otherwise 


be an individual i specific density, with a mass point of one if the input, As> equals their true 
schooling ability and zero otherwise. Similarly, define the individual density, f t as, 


/»(Ai, fc, ■■■ , Aj) 


1) if A j — IHj '7 j £ {A 2, • • • , </} 

< 

0, otherwise 


15 Infeasibility becomes more likely with large J because this introduces many more parameters to the model to 
flexibly estimate fi. 
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which has a mass point of one if the input vector, yu-i, A 2 , ■ • • , fij, equals their true vector of occu¬ 
pational abilities. 

If we observed these functions, the likelihood of the data is, 


C-i - 




fa \-t=iceC 

x n 


MJ 


Ml 

" T J 

fi(P ,Pj) 

Pr(wit\xit, p,j, ®) dit=j 


_ t=1 i =1 

- 


Qiips) 


(19) 


The symbols II II II II represent the continuous product of the inside expression evalu- 

Ml M2 Mj Ms 

ated for different values of (1 , £tj, /J. s over the support (— 00 , 00 ). It is the product equivalent 

of the integral. As cycles through all potential values of schooling ability, the likelihood is mul- 

Ms 

tiplied by one if it is not evaluating the individual’s data at their true type. Once it cycles to the 
individual’s true schooling ability, it returns the associated likelihood value. The series of continu¬ 
ous products, IIII II is functioning in the same capacity. In the multi-dimensional case, 
Ml M2 mj 

the continuous products are cycling over all dimensions of occupational ability. When the vector 
is being evaluated at a point different than the true occupational ability, then it is multiplying the 
likelihood by one. Once it cycles to the individual’s true occupational ability vector, it returns the 
associated likelihood value. 

The benefit of maximizing the likelihood in equation (19) is that it contains only products and 
no sums. This means that when we take log’s of the likelihood, the log operator passes completely 
through to the probabilities, in which the likelihood becomes a sequence of sums. Taking the log 
of equation (19), 


ln(£j) = / ( X/ = c ) fo(ttc(xit,p's))j9i(p>s) 

•''Ms V t =1 cGC ' 

f 

ln(Pi(wit\x it , <Jj)) 


( 20 ) 


t= 1 ceC 

t=i j= 1 j \ 

X fi{p> 1, M2, ■ • ■ ,/ij)Vi(As) 
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Evaluating the likelihood in equation (20) is infeasible because we do know these individual /* 
and gt densities. The EM algorithm is implemented by repeatedly maximizing the likelihood in 
equation (20) and at each iteration, we plug in estimates of fi and gi (/* and eg), which are formed 
from the data and the parameter estimates of the previous iteration. Dempster et al. (1977) show 
that solving for the parameters in this fashion is equivalent to maximizing equation (17) directly. 

Given the rn th iterations estimates of the parameters D m , 0 m , A m , SJj, A m , and a™, the den¬ 
sities fl n {) and g™^) describe the probability distribution of the individuals unobserved abilities, 
Hii, ga,-■ ■ , Hu, ms, conditional on the observed data and the parameters. These densities are 
calculated with Bayes’ rule, using the population distribution as the prior and updating it with the 
observed data. 

First, the function g™{g s ) is an estimated density describing the probability that an individual’s 
schooling ability takes a particular value ju s . This density is formed by using the prior distribution, 
given by \i s Af(zi\, (a u ) 2 ), and updating it with the joint probability of the observed data 
conditional on m s = p, s . This posterior distribution will likely not have an analytical solution. In 
light of this, I use a discretized version of this distribution with K points of support. With K 
sufficiently large, this discrete distribution approaches the continuous distribution. 16 

The discrete distribution of < 7 ™(/I s ) is given by K discrete points, denoted by jl s k for k £ 
{1,2,--- ,K}, and K probabilities, denoted by q^. for k £ {1,2, • • • , K}. The K discrete points are 
equidistant, covering [—3,3] standard deviations of the schooling ability distribution. The bounds 
associated with each point are, 


( 3 "2crm' +0 °) i!k = K 

The probabilities, q ])), represent Pr^js = p, s k) conditional on the rn th iterations estimates of the 
parameters. Without observing any choices or wages, the prior belief of this probability conditional 
16 In the empirical model, K = 25. 



6(2&i — 3) 6(2fc-l) 

2(/i-l)’ 2(K-1) 


if k = 1 


if 1 < k < K 
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on the set of observables Zi is, 


7rJ?(zi, X m , a™) = Pr (b k < fi is < b k 

% - Zi\ m 


\m m 
Z%i A , G v 


=q> 


a 


- $ 


h ~ z i^ 


a 


(21) 


where <£(•) indicates the cumulative distribution function of the standard normal. 

Letting Pr(dj|-) denote the probability of all of the observed choice d, G { dn^da,- ■ ■ ,dix} and 
Pr(wj|-) denote the probability of all of the observed wages, then using this information and the 
prior above, is the posterior distribution characterized by, 


Qik oc **?(*») O Pr ( d *l x o dsk) Pr(wj|xj, 0 m , A m , 6™j, fish) 


( 22 ) 


where x* G {xn,Xi 2 , ■ ■ ■ , x^t} is the observed data and q™. = 


Qik 


Ek'=i Qik' 


More specifically, conditional on Hi s = /j, sk and the m th iterations estimates of the reduced form 
choice probabilities, f YJ 1 , the joint probability of the observed choices in equation (22) is, 


Pr(di|xj,ft£\// S fc) = n?(x it ,/i ak ) 

t=icec 


dn —c 


(23) 


Conditional on Hi s = p, sk , the probability of the observed wages, Pr(wj|-), is found by expressing 
the distribution of w, as a multi-variate normal and evaluating the pdf at the observed values. The 
distribution will be dictated by the individual specific career paths, so it will be unique to each 
worker. To give an example as to how this is constructed, assume a worker only works in periods 
t,t', and t", such that du = j, dw = j', and dw = j ". The joint distribution of wages for this 
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worker’s observed wages, w, = [wn, w^', wn"]' and choices is, 


n ~ AT(wi,Wi) 

+ VskOsjj 
i r\m , — cm 

rlit'Vj' + ^skO s j/ 

3 

i r\m | — cm 

rlit"Vj" + H-skO sJ '' 

A m / cm \2 . / m\ 2 a m cm cm a m cm cm 

j,j (ysJj-) “1“ J ? ^j,j' OsJjOsJj/ 5 ^j,j" OsJjVsJj// 

A m cm cm Am / cm \ 2 . / m \ 2 Am cm cm 

i',i ~ ° sj j'° sj j ’ a j'j' _ Ob-v) + (°y) ; - d sj j ,<> s j.„ 

L a m cm cm A m cm cm Am / cm \2 , / m \ 2 

^j" ,j ®sJ ■// ®sJj ? ^j" ,j' *sj ■// 0sj ■/ ? ^j n ij n J ■//j i ) 


( 24 ) 


The probability of the observed wages w, is then found by evaluating the pdf at this point, 


Pr(wj|xj, © m , A m , 5™f, jl s k) (25) 

= (27r) 2 ^ det(w ,;) _ 2 exp ^-^(w; - Wj) , w^ 1 (w* - Wj)j 

where is the number of wage observations. 

Turing now to the other density, f™, which represents the distribution of the individual’s 
occupational ability. If we condition on the observed data and the rn th iterations estimate of the 
parameters, 


jAjIAs) (26) 

= Pr (n i:j = fij Vj € {1, 2, • ■ • , J}| Wi , Xi, fi s , 0 m , A m , 5™,) 

What is important about the density in equation (26) is first the schooling ability, ju s , is treated 
as data, so the density is conditional on this value. The second feature is that the choices are not 
included when we condition on the data (past wages). This goes back to the fact that individuals 
condition their choices on their beliefs and not their real values of ability. Since beliefs are simply a 
function of the past wages and their schooling ability, once we condition on these values in equation 
(26), choices add no new information about the real value of abilities. Finally, the expression for the 
density function, f™, has a very familiar form. Since the updates in the EM algorithm are formed 
using Bayes’ rule, this distribution will be the same distribution as the beliefs the individuals have 
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in the model once they condition on all of their wages (since the individuals are Bayesian updaters 
also). 

For each potential value of schooling /A the density is defined as, 


fikiP-It P'2, ' 

■ ■•,Aj)~^(es,vs) 


where, 
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«) 2 


(27) 


is the estimated mean of the occupational ability given: schooling ability equals /i, s fc, the 
parameters at iteration m, and the observed data. V,™ is the estimated covariance. The principal 
of the EM algorithm is that as we collect more data on an individual, the estimated covariance 
goes to zero and the estimated mean goes to their true value. 

and q'il are individual densities. We can aggregate these up to get the population densities, 
A and 5 s j. The estimates at the (m + 1) iteration for these distribution parameters are, 
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m+1 _ 
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i=1 k =1 

N K 

££«s 

i =1 k =1 
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(28) 


where VJJ* . represents the j th row and j' th column of the covariance matrix V^, and ^ik (j) refers 
to the j th element of the vector E^J. 
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Finally the parameters for the initial conditions equation is updated using q" f l as, 


N K 

A m+1 = argmax ^ ^ q% In 
X '° v i =1 k=l 

s.t. E(zjA) = 0 




- 


hk - ZjX 
OV 


E((z,A) 2 ) + al = 1 

d> is the cdf for the standard normal 


( 29 ) 


The estimator for the initial conditions is an interval estimator, reflecting the fact that we have a 
continuous latent variable ^i S , but we only know the probability that is falls within certain bounds 
(i.e. qik represents the probability that fj,i s G [b fc ,bfc]). The constrains enforce that the necessary 
normalizations bind. 

Implementing the EM algorithm requires the repeated computation of two steps, the expectation 
(E-step) and the maximization (M-step). So far I have completely characterized the expectation 
step, which describes the updates for /))?, q^., A m , e>”}, A m , and o™. These values are calcu¬ 
lated outside of the maximization routine following the formulas outlined above. The remaining 
parameters in the model are the functions for the reduced form choice probabilities, fi c , and the 
parameters of the wage equation, 0. To update these parameters, we maximize the likelihood in 
equation (20), plugging in the values derived in the expectation step. Equation (20) still has the 
continuous integration of schooling ability, so I replace it with the discrete approximation, so that 
the new log-likelihood is, 

K T 

In (4) =£<&££ (di£ — c) ln(fl c (x^, /Ts/c)) (^0) 

fc=l t =l ceC 

( d it =j) 

fc=l t=ll 

fik (hi > h2, ‘ ‘ * j h J ) 

Two features of this likelihood function make it appealing to maximize. First the parameters 
fl c , and the wage parameters, 0, are additively separable. This means that we can maximize these 
two components of the likelihood separately, which avoids the computational burden of jointly 
maximizing all of the parameters. The update of the reduced form choice probabilities 


ln(Pr(m^|;Cji, fi^Oj.aj)) 




K T j 
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solve , 

N K T 

fl™ +1 = argmax EE Qik 'y y y ' ' y jd'it — c ) l n {^c( x iti Usk)) (31) 

Qc i= i fc=i t -1 ceC 

The second attractive feature of the likelihood in equation (30) is that the wage observations 
are no longer collectively tied together through a multivariate integral. Instead each wage is only 
integrated over the occupational ability in that single occupation. We can replace the density, 
,fij) with it’s relevant marginal density, /■£(//,■) ~ AA(E^ ( . } , where the 

expressions EJjJ and are defined in equation (28). Writing the probability of the observed 

wage with respect to the unobserved technology shock, rj ~ J\f(0,<jj) as, 

, o'j) = t== exp (32) 

V 2a i ) 

and taking the log of this probability and integrating over the unobserved occupational ability, 
yields, 


ln(Pr(u; it |x it , fij, (fij) = 


(33) 
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Equation (33) has a closed form solution for the integration of the unobserved occupational 
ability. We can take advantage again of the additive separability in equation (30) and estimate the 
wage parameters separately for each occupation. Plugging equation (33) into the the likelihood 
in equation (30) and taking derivatives with respect to 9j and setting it equal to zero, gives the 
11 For the empirical work, the function is flexible a logit of the sate variables, Xu and p, s k . 
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objective function for the updates of 6" l+l as, 


9 ™ +1 = argmax E E « E( d « = 3) H< - '»*«; - ES U) ] 2 (34) 

i= 1 fe=l i=l 

which is simply solved using weighted ordinary least squares. 

Equation (34) provides a clear illustration of how the estimator addresses selection. is an 

individual specific mean. If individuals with high ability are more likely to have high tenure, then 

this effect will be corrected for by the mean, rather than absorbed into the returns to tenure. 
Taking the derivative with respect to a 3 gives a closed form expression for updating a " l+1 . 

{cr™ +1 f = (35) 

Eili Zti Eh (dit = j ) (vs ( .. () + (wu - h»o™ +1 - e^) 2 ) 

Efei Ek=i C Ef=i (da = j) 

Summarizing the Algorithm 

The EM algorithm is used in the first stage to solve for all of the model parameters except the 
structural utility parameters. Instead, a reduced form logit model is used to account for dynamic 
selection, postponing the estimation of the structural model to the second stage. The EM algorithm 
breaks the joint likelihood in the parameters, allowing us to solve for the choice parameters and the 
wage parameters separately. In addition, we are able to eliminate the multiple integration of the 
occupational abilities and substituted it with a closed form for the expectation so that the wage 
parameters are easily maximized using weighted ordinary least squares. 

Step 1. Given the m th iteration parameter estimates, © m , A m , 5™j, \ m , and a™ solve for the 
probability that individual i’s true schooling type is fi s k , <lYk f° r each k € {1, 2, • • • , I\}. 

Step 2. Solve for (/ii , fa, ■ ■ ■ , fij), the individual’s probability density for their occupational 
ability conditional on their schooling ability being Jl s k- 

Step 3. Using q.™ and the wage parameters, 0 m , solve for the beliefs over occupational ability h™ 
for each p s f.. 
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Step 4. Update the population parameters A m+1 , (5^} +1 , A m+1 , and <r™ +1 with 1 , fl2,-" , Mj) 

and 

Step 5. Maximize the reduced form probabilities U™' +1 given q^, the beliefs 6™, and expected 
wages using 0 m . (Although it is sufficient to flexibly express the reduced form choice prob¬ 
abilities as only a function of the data, since the EM allows us to get estimates for the 
unobserved beliefs and expected wages, these will be included also). 

Step 6. Do weighted OLS for each occupation j € {1, 2, • • ■ , J} to update 6™ +1 and cr” 5+1 

The algorithm repeats steps 1-6 until the change in the log-likelihood is less than some convergence 
/ N \ m+1 / N \ m 

criteria f ln(£j) j — I In (£*) j < le — 4. 

3.3 Second Stage: Solving the Dynamic Discrete Choice Problem 

With the first stage estiamtes, we now move to estimating the structural utility parameters. From 
the first stage, we have recovered consistent estimates of the beliefs, bu and the wage parameters, 0: 
all which play an important role in solving the structural model. Although the problem is simplified 
by recovering these parameters outside of the structural model, estimating the dynamic discrete 
choice model is still extremely complicated because evaluating the choice probabilities requires 
solving the individual’s expected future value associated with each choice. In order to address this 
overwhelming state space problem, I utilize the method of conditional choice probabilities (CCP’s) 
developed by Hotz and Miller (1993) and Arcidiacono and Miller (2010) to avoid having to solve 
the model at every point in the state space. 

To solve for the structural parameters, we need to maximize the log-likelihood, 

I< T 

ln(A) = X Qm X] MPrKtkitj kt, Usk,^, @)) (36) 

k= l t =1 

where A are the structural utility parameters and the hatted variables are estimated in the first 
stage. The choice probabilities in equation (36) are taken with respect to the unobserved, per- 
period random utility shock s. Assuming these unobservables are distributed i.i.d. type-I extreme 
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value, we can use the conditional value functions in equation (10) in section 2.3 to express these 
probabilities. 


Pr (d it = c\x it , b it ,Hsk, A, 0) 


exp(ujct(-)) 

E C ' exp(uj c ' t (-)) 


where 


^ct(') — r U j ict{Sit) “1“ \Sit> dit — c] 


(37) 


The expected future value term E[Vj_)_i(-)] is a complicated function of all potential future career 
sequences, where the expectation is taken over all possible ability signals in the current and future 
periods, as well as next periods and all future non-pecuniary choice specific utility shock, e. Rust 
(1987) shows that given that these utility shocks are additively separable in the utility function 
and that they are type-I extreme value, the expectation with respect to next periods utility shock 
has a closed form expression as the log sum of the exponential of next periods conditional value 
functions, 

E e V t+ i(S,£) = In ^^exp(u ic / m (S))^ + t, 
where t is Euler’s constant 


The term v^t+i represents the conditional value function of choice d in the next period and the 
sum is over all feasible choices. 

Hotz and Miller (1993) show that if the expression inside of the log is multiplied top and 
bottom by exp(uj c t_|_i(5)), for any choice c, then the expected future value term can be equivalently 
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expressed as the log of the conditional choice probability, plus the choice specific value function. 


E.V W {S,S) — In ( ' (Eo-exp(^, + i(5))) acp(^ +1 (5)) \ + ( 

exp(u ict+ i(5)) / 


exp(uj ct+ i(5)) 

= ln(f2 c (<S) + Vict+\{S) + l 
= — In (fi c (<S)) + Vi C t+i(S) + i 


(38) 


where I7 C (5) is the conditional choice probability of making choice c given state S. 

While the Hotz and Miller (1993) choice probability inversion is helpful in solving for E[V)+i(-)], 
we are left to evaluate the next periods choice specific valuation function, Vi c t+i(S), which contains 
the expectation of the two period ahead value function EV)t + 2(5 / , £^5, du+i = c). Again, not being 
able to compute this value, we need an alternative representation. One approach would be to apply 
equation (38) again, however it is clear that using CCP’s in this manner will require us taking the 
problem out all the way to the terminal period T. 

Arcidiacono and Miller (2010) demonstrate that dynamic models that possess the property 
of finite dependence can utilize CCP’s in a way which does not require solving the model to the 
terminal period. In fact, with finite dependence, only a few period ahead CCP’s are required. Finite 
dependence takes advantage of two features of the problem. First, equation (38) holds true for any 
choice c, meaning that with J + 2 choices, there are J + 2 equivalent ways of expressing E[V)+i(-)]. 
The second feature is that in estimating any discrete choice model, only differences in utilities 
matter. The idea proposed by Arcidiacono and Miller (2010) is that if we strategically express 
the expected future value term as a sequence of choices such that when we difference two value 
functions the remaining expected future value differences out, then we can estimate the structural 
model, requiring only a few period ahead CCP’s. In the case of the model proposed here, we need 
to look at two period ahead conditional choice probabilities. 

In estimation, I normalize all of the value functions against the unemployment choice. Therefore 
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the probability functions are 


Pr (ckt = c\x it , bu, iJ, s k, A, ©) 


exp(^ct(-)) - exp(uj u t(-)) 
E c '(exp (vic't(-)) - exp (viut(-))) 


(39) 


Finite dependence suggests expressing the above conditional value functions strategically so 
that the three period ahead expected future value term differences out. This strategic differencing 
is done for every possible choice. To demonstrate how this works, I will outline the expression for 
the conditional value of working in occupation 1, (i.e. c = 1), subtracting the conditional value 
function of unemployment in the same period. Given there are J + 2 choices which can be made in 
the next period, we can express today’s value function of choosing occupation 1 in J + 2 ways. If we 
apply the conditional choice probability inversion to replace the expected future value in the next 
period, then there are (J + 2) 2 equivalent ways we can express the conditional value of choosing 
occupation 1 today. One of the (J + 2) 2 equivalent ways is expressed as, 


v ilt (S 0 ) = uat{S°) + p 


u 


— ln(fl u (S a1 )) + Mi U t+i(S; 


+ P{- In(O u (5“ 2 )) + u iut+ 2 (S a2 ) + pEVit +3 (S a3 )) 


where, 

S al = (5°, {age it + 1), ( exprj + 1), d' it = 1, <i} 

S a2 = (5°, ( age it + 2), ( exprj + 1), d' it +i = u, Ci} 

S a3 = (5°, ( age it + 3), {exprj + 1), <4+2 = u, Ci} 

Cl ~A/’(E it (/iii),Vit(Mii) + 0'i) 


(40) 


which represents the sequence {d! it = occupation 1 ,d! it+1 = unemployment, d' lt+2 = unemployment}, 
where the primes ( 7 ) represent a potential sequence. S al ,S a2 ,S a ' i represent how the state space 
evolves one, two, and three periods ahead respectively, given this potential sequence. For the 
worker, what is relevant three periods away, is that they will be three years older than they were 
at 5°, they will have one more year of occupation 1 experience, they will have been unemployed in 

the previous period, and they will have received one ability signal for occupation 1. 

Likewise, there are (J + 2) 2 equivalent ways we can define the conditional value of being unem¬ 
ployed in the current period. One such way is the potential path, {d' it = unemployment, d! it+1 = 
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occupation 1 ,d' it+2 = unemployment}, which is, 


Viut(S°) = Uiut(S°) + P 


L 


-Hn 1 (s bl )) + u ilt+ i(s bl ) 


+ p (-lii(fi u (5 f ’ 2 )) + Wiut+a^ 62 ) + /3EVit +3 (S b3 )) 


where, 


S bl = (5°, ( age it + 1), ( exprj + 0), d' it = u} 

S b2 = (5°, (ageu + 2), (exprj + 1), d' it+1 = 1, Ci} 

S b 3 = (5°, (age it + 3), (exprj + 1), d ' it+2 = u, Ci} 

£l ~ A/"(E,:i(/Tii), Vit(/iii) + aj) 


(41) 


Finite dependence shows up in the fact that the expected state vector three periods away for 

both sequences are equivalent (i.e. /3 3 / EV^_|_3(5“ 3 ) = (3 3 EF^ + 3(5 63 )). When we difference these 

■ki 

two expressions of the conditional value functions to estimate the parameters, the unknown value 
will difference out. This leaves for estimation, 

vudS' 0 ) - Viut(S°) = Uiit(S° ) - pf ln(fl u (5 a1 )) - p 2 [ln(n u (S a2 )) (42) 

•'Ci j Cl 

- - p[ 111(12! (S bl )) + pf u ilt +i(S bl ) - P 2 [ In (Q u (S b2 )) 

4 Ci 4 Ci 

where the the flow utility for unemployment is normalized to zero (i.e. u m t = 0). 

This representation of the differenced value function reduces down to an expression which 
contains only the contemporaneous utility functions and conditional choice probabilities. Repeating 
this strategic differencing for each of the other choices, we can get an expression for the structural 
utility model without directly solving the optimization problem, or doing any backwards recursion 
from the terminal period. 

The conditional choice probabilities were estimated in the first stage. Plugging these values in 
as data, I use the expressions above for the differenced conditional value functions to solve for the 
structural utility parameters, 

I< T 

A = argma xV® ln(Pr(d it |x it , b it , fi s k, A, 0, fi c )) (43) 

A ii ' 

A k =1 t= 1 
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4 Data Extract and Summary: NLSY97 


The model is estimated using the 1997 cohort of the National Longitudinal Survey of Youth 
(NLSY97). The NLSY97 is a nationally representative sample of men and women who were born 
between 1980 and 1984. These individuals were between the ages of 12 and 18 at the time of their 
first interview in 1997, and respondents are interviewed annually. I use rounds 1 to round 12 for 
the analysis, where the respondents are between the ages of 23 and 29 in round 12. 18 On an an¬ 
nual basis, interviewers collect detailed information on the respondent’s education and employment 
activities retrospectively to the date of the last interview. 

Based on the responses to the retrospective education and employment questions, each year 
an individual is assigned to one of twelve mutually exclusive activities: unemployment, school¬ 
ing, or employment in one of 10 occupations { 1-Management, business, and financial operations 
occupations, 2-Professional and related occupations, 3-Other Service occupations, 4-Food Service 
occupations, 5-Sales and related occupations, 6-Office and administrative support occupations, 
7-Construction and extraction occupations, 8-Installation, maintenance, and repair occupations, 
9-Production occupations, 10-Transportation and material moving occupations}. 

The assignment of activities was done sequentially, starting with education. If the individual 
reports attending school for five out of the 12 months in a year for high schoolers (four out of 12 
months for college students) and reported an increase in their highest grade completed between 
interviews, then they were categorized as enrolled in school for the period. 19 If information is not 
available for the entire year, for example in their last interview round, then the attendance in the 
available months was converted to a 12 month equivalent and the same criteria applied. In addition, 
in the last round, no information is available regarding whether they completed a grade that year, 

18 Round 13 is the most current round available. This survey was conducted in 2009. Including year 2009 had 
drastic effects on the unemployment rate of these workers, which calls into question the stationarity assumption of 
the structural choice parameters. Conceivably it is possible to allow for aggregate shocks into the model, but that is 
outside the scope of this paper. However, the round 13 information was used to supplement the round 12 data where 
possible to provide a more complete picture of the individuals activities in round 12, rather than relying on a partial 
years worth of data. 

19 In a few cases grade completion was hard coded to more accurately reflect school progression. For example, if the 
individual reported attending college for two years but did not report a grade increase in the subsequent year, but 
reported skipping a grade two periods away (i.e. 12, 12, 14), then the middle year was re-coded as 13, assuming they 
were continually in school rather than dropping out and then skipping a grade. Only a handful of records required 
this scrutiny. 
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Table 1: Occupation Categories 


Occ. 

Cat. 

2002 

3-Digit 

Census 

Code 

Description 

i 

0010-0950 

Management, business, and financial operations occupations 

0010-0430 Management occupations 

0500-0950 Business and financial operations occupations 

2 

1000-3540 

Professional and related occupations 

1000-1240 Computer and mathematical occupations 

1300-1560 Architecture and engineering occupations 

1600-1960 Life, physical, and social science occupations 

2000-2060 Community and social services occupations 

2100-2150 Legal occupations 

2200-2550 Education, training, and library occupations 

2600-2960 Arts, design, entertainment, sports, and media occupations 
3000-3540 Healthcare practitioner and technical occupations 

3 

3600-3950; 

4200-4650 

Non-Food Service Occupations 

3600-3650 Healthcare support occupations 

3700-3950 Protective service occupations 

4200-4250 Building and grounds cleaning and maintenance occupations 
4300-4650 Personal care and service occupations 

4 

4000-4160 

Food preparation and serving related occupations 

5 

4700-4960 

Sales and related occupations 

6 

5000-5930 

Office and administrative support occupations 

7 

6000-6940 

Construction and extraction occupations 

6000-6130 Farming, fishing, and forestry occupations 

6200-6940 Construction and extraction occupations 

8 

7000-7620 

Installation, maintenance, and repair occupations 

9 

7700-8960 

Production occupations 

10 

9000-9750 

Transportation and material moving occupations 


so attendance was the sole criteria in determining the schooling category. 

If the respondent did not meet the school enrollment criteria and reported working at least 25 
weeks in the year in a job working at least 20 hours per week, then the individual was consid¬ 
ered employed. Again, if the individual was not present for the entire year, the weeks observed 
where converted to an annual equivalent. If the individual worked multiple full-time jobs, their the 
attributes of the job with the most full-time weeks was assigned for that year. For each job, the 
respondent identifies the 3-digit 2002 Census occupation code and hourly compensation rate. Given 
the reported 3-digit occupation, the individual is assigned to one of the 10 occupation categories 
following table 1. Wage is assigned using the hourly compensation rate of pay, which includes all 
forms of monetary compensation affiliated with the job and is deflated using the consumer price 
index to year 2002 dollars. 

The main purpose of utilizing the estimation strategy outlined in the previous section is that 
the computational difficulty only grows linearly in the number of occupations in the model. In this 
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sense, the empirical model can accommodate a larger choice set than the ten aggregated occupation 
codes described above. However, what prohibits us from analyzing occupations at a finer level of 
detail is sample size. In the current specification all of the wage parameters and choice parameters 
are occupation specific. Including the cross-occupational correlation terms, the technology shock 
standard error, and other parameters, each occupation is associated with about 27 parameters. 
Without additional assumptions (e.g. forcing the wage returns to education to be constant across 
occupations), we require a reasonable amount of observations in these occupations to identify 
the parameters. The ten occupation categories where drawn such that each occupation category 
provided at least 350 wage observations and categories were consistent with the 2002 occupation 
classification system. Table 1 shows how the 2002, 1-digit census occupation codes map to the ten 
occupations used in the empirical analysis. Six of the 10 occupation groups contain a single 1-digit 
occupation code. Two of the ten only combine two 1-digit occupation codes. Only professional and 
other service occupations combine multiple 1-digit occupation codes. 

If an individual is not assigned to schooling or one of the 10 occupation employment activities, 
then they are assigned to unemployment (or home production). 

The analysis focuses on the white male cross-sectional sample from the data. This group consists 
of 2,284 individuals. The discrete decision period corresponds to the school year (September to 
August). The decisions of individuals are continually tracked from age 16 (t= 1) until round 12 
unless any one of five events occur: 1) the gap between interview dates exceeds 16 months, 2) if 
their highest grade completed or degree completed is invalid from the survey data, 3) the individual’s 
primary activity was employment, but no occupation was reported or the reported wage was less 
than $5 per hour or greater than $100 per hour, 4) they joined the military, or 5) reported completing 
20 or more years of education. 20 Of the original population, 2,125 individuals survive to age 16, with 
the final sample containing 15,198 person year observations, averaging 7.15 years per individual. 

Table 2 summarizes the distribution of activities by age, as well as the number of observations 
at each age. As expected, unemployment for these young workers is relatively high ranging between 

20 There appear to be a large fraction of medical students who meet the employment criteria with 20 years education. 
Mean wages increase monotonically across education levels except for 19 years to 20 years where the average wage 
falls 28%. These outliers have a strong effect on the returns to education, so these 20 observations are dropped. 
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10-15%. The share of employment is not even across the 10 occupations. At age 24 when about 
half of the population is still in the sample, the largest shares of employment are in professional, 
sales, or construction occupations. Management, office, and transportation have about average 
representation. And, service, food, maintenance, and production have below average shares. 

Table 3 describes in greater detail the employment activities of white males between the ages of 
16 and 28. For each of the ten occupation categories, table 3 reports the top four, 3-digit detailed 
occupation codes, as well as the percentage of observed employment records in those occupations. 
For food, sales, and transportation occupations, the top four detailed occupations represent more 
than 70% of the employment records in those categories, implying that white males are more 
concentrated in a handful of 3-digit occupation codes. Comparably, only 25% of employment 
records are represented by the top four detailed occupations in professional occupations, suggesting 
that workers are much more spread across the 3-digit detail occupations in professional occupations. 

Previous papers modeling career decisions, (e.g. Neal (1999); Pavan (2009); Kambourov and 
Manovskii (2009)) using the NLSY79 or Panel Study of Income Dynamics (PSID) data have docu¬ 
mented the high potential for measurement error when the reported occupation code is take directly 
from the data. To avoid counting false career changes these papers impose a number of edits on 
the occupational data. The primary edit is to not consider any occupational change unless it is 
accompanied by a change in employer. Yamaguchi (2010) points out that this may be an undesir¬ 
able restriction on the data as it is likely to exclude important career changes as individuals are 
promoted within the firm. 

The NLYS97 is unique from the NLSY79 and PSID in that it likely does not suffer from system¬ 
atic measurement error in the reported occupation. The NLSY97 is conducted with a computer- 
assisted interview system which allows interviewers to reference back to the responses of their 
previous years interview. Interviewees are first read their previous years job description and are 
asked if that continues to define their job function. The occupation code only changes if they report 
a change in duties. Pavan (2009), using the NLSY79 data, cites evidence of spurious reported occu¬ 
pation changes by the fact that 40% of individuals remaining with the same employer in consecutive 
periods report a change in 3-digit occupations. The analogous figure for the NLSY97 data at the 
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Table 3: Description of Occupation Categories 


Occupation Categories (# of 

3- digit census codes) 

Top-Four, 3-digit Occupations 
(% of obs. in category) 

Num. 
of Obs. 

Percent 
of Obs. 
in Top- 
Four 

Management, business, and 
financial operations 
occupations (52) 

Managers, all other (11.8) 

Accountants and Auditors (9.1) 
Construction Managers (8.9) 
Management Analysts (8.6) 

373 

38.4% 

Professional and related 
occupations (123) 

Computer software engineers (7.7) 
Computer Support Specialist (6.3) 
Secondary School teachers (5.7) 
Network System/Data analysts (4.8) 

600 

24.5% 

Non-Food Service 

Occupations (49) 

Grounds worker (21.1) 

Janitor (15.3) 

Security Guard (14.1) 

Nursing/Home health aides (10.2) 

412 

60.7% 

Food preparation and serving 
related occupations (13) 

Cooks (35.7) 

Waiters (16.0) 

Sup. Food Prep. (11.1) 

Food Prep. (9.0) 

620 

71.8% 

Sales and related occupations 
(18) 

Retail salesperson (28.0) 

Cashier (22.0) 

Sup. Retail sales (19.6) 

Sales rep. whls/mfg (6.2) 

724 

75.8% 

Office and administrative 
support occupations (51) 

Stock clerk (30.8) 

Customer Service rep (13.8) 
Shipping/Rec. clerk (10.4) 

Dispatcher (4.5) 

559 

59.4% 

Construction and extraction 
occupations (48) 

Carpenters (24.2) 

Construction laborer (22.0) 

Electrician (7.6) 

Plumbers/pipfitters (6.6) 

1087 

60.5% 

Installation, maintenance, and 
repair occupations (36) 

Auto tech/ mechanic (24.3) 

HVAC mechanic/installer (10.0) 
Computer/office machine repair (7.3) 
Auto body repair (6.9) 

452 

48.5% 

Production occupations (81) 

Production worker (15.0) 

Assembler/frabricator (12.3) 
Welding/soldering worker (9.5) 

Metal worker/ plastic worker (7.0) 

547 

43.7% 

Transportation and material 
moving occupations (34) 

Laborer and freight movers (32.8) 
Driver/Sales, truck driver (22.7) 
Cleaner of Vehicle/equip. (11.6) 
Industrial trk/tractor operator (9.9) 

739 

77.0% 


41 





Table 4: Employer and Occupational Mobility for Individuals working in t and 
t+1 


(percent) 
(row percent) 
(col. percent) 

Same Employer t + 1 Diff. Employer t + 1 Total 

Same Occupation (10 

Cat.) t + 1 

59% 

84% 

92% 

11% 

16% 

31% 

70% 

Diff. Occupation (10 Cat.) 
t + 1 

5% 

16% 

8% 

25% 

84% 

69% 

30% 


3-digit detail level is only 12% of workers. This is a reasonable figure representing mobility within 
firms. The fact that this number is not overstated provides reasonable assurance of the reliability 
of the observed occupation changes. This feature makes the NLSY97 particularly desirable for a 
model that looks at a finer level of occupational choice. 

Table 4 provides details on the mobility of this cohort of workers. The table shows that even 
using the broader 10 occupation categories, workers have a 30% chance of working in a different 
occupation in adjacent periods. Most notable is that 17% of occupation changes at the 10 category 
level occur within the same firm. These occupation changes would otherwise be ignored using the 
previous literature’s edits. 

5 Results 

One of the primary deliverables of the model is the estimates of the distributional parameters 
and correlation across occupational abilities. These estimates are found in table 5. The first 
row reports the standard deviation of the unconditional, marginal ability distribution for each 
occupation. Given that m-j enters additively in the log-wage specification, these values have a 
direct interpretation as the percentage difference in wages, all else equal. For example, the ability 
standard deviation for sales is 0.33. This means that an individual with a sales ability match at one 
standard deviation above zero will receive a 33% higher wage than an individual who’s sales ability 
is at the mean. These numbers represent the potential gains from search, where an individual can 
potentially realize very large wage increases simply by finding an occupation they are well suited 
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for. 


The second row in table 5 contains the estimate of the covariance of schooling ability and 
occupational ability. Using the expression in equation (3) for the initial beliefs regarding occupa¬ 
tional ability conditional on schooling ability, we get an interpretation for this parameter. Since 
E,i (iMj) = S s j U) His, we interpret the estimates of as the expected match value for an individ¬ 
ual with schooling ability at one standard deviation in the distribution. For example, an individual 
at one standard deviation in the schooling distribution will expect wages in professional occupations 
to be 16% higher due to their ability compared to an individual at the -1 standard deviation in the 
schooling ability (2 x 0.0816), holding constant the level of education. 

The lower triangular matrix in table 5 is the estimated correlation matrix across occupational 
abilities. With the exception of one element, this matrix is uniformly positive, implying an under¬ 
lying element of general ability that is persistent throughout the correlation matrix. This means 
that when a worker learns they are good at one thing, they are likely to be good at everything. 
The single exception is between schooling ability and transportation workers. The negative covari¬ 
ance suggests that those with high schooling ability will likely be less productive in transportation 
occupations, holding the level of schooling constant. 

Looking across this correlation structure we detect some additional patterns that surface from 
the data. For example the second largest correlation, that between sales occupations and office 
and administrative occupations (0.7361), is between occupations often defined by the 2002 census 
code as a single occupation ’’Sales and Office Occupations”. This is to say that the census has 
likely grouped these occupations based on the assumption that these occupations share a similar 
set of skill requirements. Given the estimated high correlation coefficient, the data suggests there 
may be some merit to this claim (although they are not perfectly correlated). The same holds true 
between production occupations and maintenance and repair occupations, which has the highest 
correlation of (0.7448). 

One of the drawbacks of the raw correlation matrix is that it provides little information regarding 
the relative correlation between any two elements, meaning that a spurious correlation could exist 
between two occupations that is driven by the two occupations being correlated with a third 
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Table 5: Distributional Parameters and Correlation Structure 
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a Standard errors are reported in parenthesis and are constructed by 180 boot straps of the model 
'* Denotes significance at the 5% level. 

* Denotes significance at the 10% level. 



occupation. As an example, it may not be clear why sales occupations and production occupations 
have a correlation of (0.4787). One possibility is that the skills required are common across these two 
occupations, so a high ability in one implies a high ability in the other. The second possibility is that 
they share no skill requirements and the entire correlation is driven by the fact that sales occupations 
are correlated with maintenance and repair occupations, and likewise production occupations are 
also correlated with maintenance occupations. So any relationship we observe between sales and 
production is due entirely to their relationship to maintenance and repair. 

To address this issue, table 6 compares the correlation of two occupations in isolation, margining 
out the effect of the other J — 2 occupations. This procedure isolates the direct correlation effect 
and removes the secondary effects mentioned above. The most striking result from table 6 is that 
about one-third of the the correlation pairs are now negative. These negative numbers have the 
implication that it is possible for workers to learn they are good in an occupation, by actually 
learning they are bad at another occupation. 

For example, returning to the paired correlation between sales occupations and production 
occupations, in table 6 it is (-0.4245). This is in contrast to the (+0.4787) found in the full 
correlation structure in table 5. An example of what these numbers mean is that if a worker learns 
they have a high ability in maintenance occupations, looking at table 5, they will infer that they 
have among other things, a high ability in sales occupations and production occupations. The 
fact that the conditional correlation between sales and production is negative means that if the 
worker then moves to a production occupation and discovers a poor match, then their belief about 
their sales match will actually increase. The idea in simple terms is that assume there are two 
components that make up ability for maintenance and repair occupations, one that is related to 
sales and one that is related to production. By only observing their maintenance and repair ability, 
the worker cannot disentangle which source is driving their high ability. Once they learn that it 
is not the production portion by gaining production experience, they then infer that it is actually 
the sales component, increasing their expectation of their sales ability. 

The estimates of the wage parameters are presented in table 7. Given the non-linear way 
education and experience are modeled, the bottom half of the table contains a number of imputed 
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returns based off of the estimates. The wage estimates suggest an interesting trade-off workers 
may face highlighted in Miller (1984), which is between occupations with high expected wages and 
low variance versus low expected wages, but high variance in unknown abilities. For example, a 
white male high school graduate choosing their first occupation in construction can expect about 
15% higher wages than if they worked in a sales occupation. However, the standard deviation in 
unknown occupational ability for sales occupations is about 20% higher than construction, implying 
their are potentially more informational benefits from sales occupations. 

The structural choice estimates are in table 8. The coefficient of relative risk aversion, p, is 
estimated at 2.433. This figure represents a moderate level of risk aversion and is consistent with 
other findings in the literature on individual’s risk preferences (e.g. Mehra and Prescott (1985)). In 
comparison to the entry costs, the estimated level of risk aversion implies that it is a less significant 
search friction. Entry costs vary across occupations. The main entry cost, aj 2 - measures the cost to 
finding work in an occupation if they were not engaged in that occupation in the previous period. 
An additional entry cost, aj-j, is incurred if the worker also has no experience in that occupation. 
Production occupations have the highest entry costs for a new worker of all of the occupations . 
Given the marginal utility of income at entry level wages, the entry costs for production occupations 
of (—1.952—5.193) is equivalent to about $26,000, which is more than the entry level annual salary. 21 
The entry costs for an individual who is returning to a production job, is much less at about $7,000, 
or about 1/3 of an annual salary. This figure makes since if 4 months is the approximate time it 
takes to secure a job in production. 

Food service on the other hand has the lowest entry costs for new workers with no food service 
experience, with a dollar equivalent of about $2,000. Those returning to food service face entry 
costs of about $4,400. Given that re-entry costs are double the initial entry costs implies workers 
are more reluctant to return to food service after leaving than they were to initially work in it. It 
is unclear whether these two extremes of initial entry costs ($26,000 for production or $2,000 for 

21 d u — — a w ivage~ p . With entry level wages at about $9.20, the marginal utility of wages is 0.5425. This 

awage 

number is the marginal value of a $1 increase in wages. If workers work 2,000 hours per year, then 1 util is equivalent 
to about $3,700 
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Table 7: Wage Estimates and Imputed Returns to Accumulated Human Capital a 

Man./ Office/ Const- Maint./ Prod- 

Bus. Prof. Service Food Sales Admin. ruct. Repair uction Transp. 

Constant 1.4626* 0.9331 2.5066** 1.3726** 1.6366** 2.5048** 2.8881** 3.8255** 2.6864** 1.7220** 
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b Imputed by Oj 1 + (ed = 12)0, 2 + (ed = 12) 2 /100^3 

cImputed by [(Oji + (ed = 16)0,' 2 + (ed = 16) 2 /lOO0j3) — (Oji + (ed = 12)0j2 + (ed = 12) 2 /lOO0j3)]/4 
d Imputed by (Oji + ( Exprj = 4)0,5 + (Exprj = 4) 2 /lOO0js)/4 

* Denotes significance at the 5% level. 

* Denotes significance at the 10% level. 



Table 8: Structural Choice Estimates' 1 


Employment 


M anager / P rofes- 

Business sional 

Other 

Service 

Food 

Service 

Sales 

Non-Pecuniary Constant, ctji 

5.930 

(4.545) 

6.462* 

(3.845) 

8.441** 

(4.263) 

4.012 

(4.247) 

9.590** 

(4.262) 

Entry Cost (dt—i 7^ j), otj 2 

-2.922*" 

(0.382) 

-2.059** 

(0.286) 

-2.329** 

(0.231) 

-1.199** 

(0.187) 

-2.097** 

(0.205) 

No Experience ( Exprj = 0), 
a jZ 

-0.119 

(1.543) 

-3.025** 

(1.509) 

-1.287 

(0.917) 

0.635 

(1.025) 

-1.708* 

(0.965) 


Office/ 

Admin. 

Cosntruc- 

tion 

Mainten¬ 

ance 

Produc¬ 

tion 

Transpor¬ 

tation 

Non-Pecuniary Constant, otj 1 

9.118** 

(4.281) 

8.739** 

(4.068) 

13.636** 

(4.271) 

13.878** 

(4.604) 

10.488** 

(4.177) 

Entry Cost ( dt—i 7^ j), Oij 2 

-1.635*" 

(0.180) 

-1.423** 

(0.163) 

-2.441** 

(0.262) 

-1.952** 

(0.208) 

-1.766** 

(0.177) 

No Experience ( Exprj = 0), 
OL j2> 

-2.552*" 

(0.943) 

-3.179** 

(0.962) 

-3.997** 

(1.258) 

-5.193** 

(1.123) 

-1.189 

(0.992) 



Utility Over Income and Risk & 


Coef. on Wages, a w 

120.031 [7.350 . 1797.669] 




Rel. Risk Aversion, p 

2.433 [1.285 , 3.806] 




Schooling 

Constant, a s \ 

4.453** (0.924) 




College Attendance, a . S 2 

-3.828** 

(0.754) 




Re-entry HS, a s 3 

-3.519** 

(0.200) 




Re-entry Col., a. s ± 

-1.289** 

(0.106) 




Schooling Ability, a S 5 

0.751** (0.243) 





“ P is fixed to 0.95. Standard errors are reported in parenthesis and are constructed by 180 
bootstraps of the model. 

h Bracketed numbers represent the 95% confidence interval of the parameter estimate from the 
180 bootstraps. 

Denotes significance at the 5% level. 

Denotes significance at the 10% level. 
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food service) are reasonable in comparison to the previous literature’s estimates given that these 
occupations are typically aggregated together into blue collar occupations. 

Entry costs may be more or less relevant in the context of the occupation specific non-pecuniary 
constant, ayi. Even though production has the largest entry costs, if also has the largest non- 
pecuniary benefit, leaving it on net the occupation with the third highest (behind transportation 
and maintenance) non-pecuniary benefit for new entrants (13.878-1.952-5.193). This very large 
constant means that once workers have experience in production, and in particular worked in 
production in the previous period, they are very likely to work in production in the next period. 
This pressure to continue in the occupation from the previous period functions as a major friction 
to sorting on ability, where even workers who receive poor information on their ability will be 
reluctant to seek out a better match. 

Turing to the results on schooling preferences, on average individuals receive a positive utility 
for attending school, with a substantial reduction in utility for attending college. At median, entry 
level wages, with marginal utility of wages equal to 0.5425, the disutility of college of -3.828 implies 
a cost of about $14,000 per year of college. Schooling ability significantly effects the decision to 
attend school. The coefficient estimate of 0.751 implies that an individual at plus one standard 
deviation in the schooling ability distribution receives the equivalent of a $2,800 tuition subsidy 
compared to an individual at the mean. 

6 Analysis: Occupational Choice, Sorting on Ability, and Wages 

This section explores in greater detail the importance of sorting on ability in observed occupational 
choices. The large variances of the occupational ability distribution indicate that workers have 
much to gain about learning where they fall in each distribution. However, in reality, there are 
a myriad of factors in addition to sorting on ability that drive occupational choice. Using the 
parameter estimates, we can simulate career paths to gain much needed insight into the aptitude 
of workers to sort on ability and perhaps more importantly the effect that sorting on ability has on 
wage growth. 
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We can define the individuals expected ex-ante occupational match as, 


Ej =1 

Ej =1 


( 44 ) 


where represents the individual’s optimal choice strategy given the state at time t. This 

measure has a nice interpretation in that as it moves away from their average match, it implies that 
the worker is choosing a match with certainty. For example if the ex-ante expected ability equals 
their maximum (or minimum) ability, then we can say that they are choosing their maximum (or 
minimum) ability with certainty. 

The empirical model is robust in the sense that it allows many other factors not relating to 
ability to influence occupational choice (e.g. entry costs and risk aversion), meaning that the model 
does not require people to sort on ability. Therefore, on an individual level, if we see a worker choose 
their maximum occupation with certainty we cannot say for sure that this is driven by sorting on 
ability because this outcome could be driven by outside factors. 22 However, the theoretical ability 
matching model, in the absence of search frictions, says that the fraction of workers whose ex-ante 
expected ability equals their maximum ability should continually improve over time. If we observe 
this trend in the data on an aggregate level, then this suggests that sorting on ability is relevant 
to occupational choice. However, if other factors dominate the workers aptitude to sort on ability, 
then there will likely to no discernible patterns in the ex-ante expected ability. 

To see what trends (if any) exist in the data I simulate 1 million careers using the parameter 
estimates. Rather than looking at the full distribution of abilities, for each worker their set of ten 
ability matches are grouped into three categories: O i _3 corresponds to the range of the individual’s 
top three occupational abilities, 07 -io corresponds to the range of the individual’s worst three 
occupational abilities, and the remaining category, O 4 - 6 , includes the ability levels that are not 
in the best three or worst three. Table 9 tracks the fraction of workers who’s ex-ante expected 

“For example, they could have learned that this is their highest ability and are thus sorting into that occupation. 
Alternatively, they could be a high school drop out, and perhaps it is a common rule for all high school drop outs to 
go into transportation occupations, of which this individual happens to have their highest match in transportation. 
Another possibility is that workers always choose occupations at random and then stay in that job forever. In this 
workers case that just happened to be their high ability occupation. The first explanation entails the worker sorting 
on ability, while the later two explanations are driven by strictly random search processes. 
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ability falls in each segment of the distribution of their abilities. The analysis is broken down 
by educational attainment because these workers have very different priors, occupational search 
patterns, and labor market attachment. 

It is important to understand what table 9 is measuring. It is the ex-ante expected ability, not 
the ex-post. Therefore, the column Pr(/2 £ Oj- 10 ) does not measure the fraction of workers who 
choose an occupation in their bottom three ability ex-post, it is the fraction who are likely choose 
one of their worst occupations ex-ante. To draw the distinction, assume workers choose occupations 
completely randomly (i.e. with probability of choosing each occupation, 1/J). Then ex-post , 30% 
of workers will end up in one of their three worst occupations. This fraction is not very informative 
because these workers ended up in these occupations completely unintentionally. The measure we 
are interested in, Pr(/i £ O 7 _io), is the fraction of workers who intentionally, or in some sense, 
are likely to choose one of their worst three occupations. In the context of the random decision 
model, this value is zero because the randomness implies that ex-ante their match is likely in the 
middle of the distribution. Looking at ex-ante expected ability is a way to separate intentional and 
unintentional occupational sorting on ability. 

The ex-ante expected match at zeros years of labor market experience is quite different across 
educational categories. The table shows that nearly all (96.6%) of high school gradates with zero 
years of labor market experience have an expected match in the middle range of their match values. 
It is not surprising that the overwhelming fraction of high school entrants have an expected ability 
in the middle of their ability distribution since they are new entrants. However, given initial choices, 
2.4% will end up in one of their top three occupations, and 1.0% end up in one of their bottom 
occupations. It is also not surprising that these fractions are not equal and biased toward the upper 
end of the distribution. What this means is that high school graduates have priors that will on 
average put them in a higher ability occupation. Therefore, given however high school graduates 
choose occupations, there are 2.4% that are likely ending up in one of their best matches and only 
1.0% ending up in a poor match. 

However, given that these are new entrants into the labor force, it is likely that the large pooling 
in the middle of the distribution is due to the fact that sorting on ability at this stage of the career 
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Table 9: Occupational Sorting by Labor Market Experience and Education 



Labor Market 




Education Group 

Experience (Yrs.) 

Pr (fi € Or- io)“ 

Pr(/x € O 4 - 6 ) 

Pr(M € Oi-s)" 


0 

1.9% 

86.9% 

11.2% 


1 

7.4% 

72.9% 

19.7% 


2 

7.6% 

71.4% 

21.0% 


3 

7.6% 

70.6% 

21.8% 

Less Than HS 

4 

7.6% 

69.6% 

22.7% 


5 

7.8% 

68.9% 

23.3% 


6 

7.8% 

68.0% 

24.2% 


7 

8.0% 

67.1% 

24.9% 


8 

8.4% 

65.9% 

25.7% 


0 

1.0% 

96.6% 

2.4% 


1 

8.8% 

77.3% 

13.9% 


2 

10.0% 

73.2% 

16.8% 


3 

10.8% 

70.3% 

18.9% 

HS 

4 

11.3% 

68.1% 

20.7% 

5 

11.8% 

66.1% 

22.2% 


6 

12.2% 

64.1% 

23.7% 


7 

12.5% 

62.4% 

25.1% 


8 

12.6% 

61.4% 

26.0% 


9 

13.2% 

59.5% 

27.4% 


0 

8.9% 

74.8% 

16.3% 


1 

14.8% 

61.4% 

23.8% 

College 

2 

16.2% 

58.0% 

25.8% 

3 

16.7% 

56.3% 

27.0% 


4 

16.6% 

55.6% 

27.8% 


5 

16.1% 

55.3% 

28.6% 


“ The fraction of individuals likely to choose one of their worst three occupational abilities. 
b The fraction of individuals likely to choose one of their top three occupational abilities. 
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is more or less unintentional (random). 

The difference between 2.4% and 1.0% is likely inconsequential. However, for high school drop 
outs we see the same pattern, but on a much larger and more convincing scale. Given their optimal 
choices, 11.2% have an expected match that falls within the range of their top three occupational 
matches prior to any labor market experience. This is evidence that for these workers, their 
priors really matter. Given their knowledge of their schooling ability and correlation structure, 
they are able to narrow down which occupations they are likely to have high (or low) ability and 
most importantly showsstrong evidence that they are sorting (intentionally) on this information. 
Conversely, only 1.9% have an expected match in the range of their bottom three abilities. 

College graduates show a similarly strong pattern in their initial priors, with 16.3% likely to 
choose one of their top three occupational abilities, even having no labor market experience. What 
is interesting is that this number is only twice as large as the fraction of college graduates who 
are likely to choose one of their worst three occupations (compared to 6 times as large for high 
school drop-outs). What’s driving these 8.9% of college graduates to choose one of their three 
worst occupations is unclear. Two possible explanations are that college graduates tend to choose 
occupations with higher variances in abilities (managers/ business operations and professionals), 
or that they knowingly have a poor match but choose the occupation for other reasons (e.g. they 
provide high returns to education). 

The table tracks the changes in this distribution across labor market experience. The largest 
change in these percentages occurs between zero years and one year of labor market experience. 
With one year experience, individual choices become much more heterogeneous. For high school 
graduates, 8.8% will likely choose one of their three worst occupational matches, up from 1.0% when 
they have zero years labor market experience. What drives this large increase is the switching costs 
in the model. For workers with no labor market experience that randomly initially choose one 
of their worst occupations, the switching costs in the model suggest that these workers are likely 
to stay in this occupation despite the low match. Given that these workers are likely to choose 
this poorly matched occupation again, their ex-ante expected ability will fall accordingly and the 
fraction of workers who are likely to work in one of their three worst occupations will increase. 
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For high school graduates we observed a 7.8 point increase in the fraction of workers in one of 
their worst matches between zero and one year experience. For these same workers we observe a 
11.5 point increase in the fraction who are likely to work in one of their best occupations. This 
asymmetric increase gives positive evidence that, although workers face switching costs, sorting on 
ability has a dominant role in occupational choice. Looking further down the career with 9 years 
labor market experience, 27.4% of high school graduates are likely to work in one of their top three 
occupations. Comparatively only 13.2% are likely to work in one of their worst three. This strongly 
indicates that sorting on ability plays an important role in occupational decisions. 

High school graduates show the largest gain in the fraction expecting to work in one of their 
top three occupations over the initial stages of their career. A similar pattern exists for high 
school drop outs and college graduates, however, since these workers relied more heavily on their 
priors, the gains from learning in the labor market where not as large. The more asymmetric 
these distributions become over time, implies a greater importance of intentional sorting on ability 
in making career decisions. The asymmetry can be measured as the fraction of workers likely to 
work in one of their top three occupations versus the fraction likely to work in one of their bottom 
three. By this measure, sorting on ability is most important for high school drop outs. At five 
years experience these workers are nearly three times more likely to be working in one of their 
top three occupations compared to one of their bottom three occupations. Comparatively, high 
school gradates are twice as likely and college gradates are 1.75 times as likely. This high fraction 
of college graduates in low matches (16.1%) suggests that additional factors play an important role 
for occupational decisions of college graduates other than sorting on ability. One attributing factor 
may be that college graduates may choose a particular occupation not because they have a high 
ability in that occupation, but because the returns to college are high in that occupation. If they 
exercise flexibility in occupational choice, they may forgo some of the returns to education. 

To assess the wage gains associated with ability sorting, I will only focus on the match com¬ 
ponent of wages, Hij, which measures the percentage contribution of ability to wages. We are 
interested in looking at how the average match value evolves over the career. Table 10 shows the 
expected value of m-j at different points in the career for high school graduates. Given the optimal 
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Table 10: Wage Gains (in percent) Due to Sorting on Ability For High School 
Graduates 


Labor Market 
Experience (Yrs.) 

E(aKj) 

(a) 

E (iMj) 

(working) 

( b ) 

e(mo) 

(2 nd best, can 
exit labor 
force) 

(c) 

E(m»j) 

(2 nd best, 
can’t exit 
labor force) 

( d ) 

0 

-0.1% 

0.5% 

0.3% 

1 .1% 

1 

2.4% 

3.5% 

1 .8% 

1 .1% 

2 

2.9% 

4.0% 

2 .2% 

1 .2% 

3 

3.3% 

4.4% 

2.5% 

1.3% 

4 

3.9% 

4.9% 

2 .8% 

1 .6% 

5 

4.4% 

5.4% 

3.2% 

2 .0% 

6 

5.2% 

6 .1% 

3.9% 

2.7% 

7 

6 .1% 

6 .8% 

4.5% 

3.4% 

8 

6.5% 

7.1% 

4.8% 

3.8% 

9 

7.5% 

8 .0% 

5.6% 

4.7% 


decisions rules Qj(i,t), which is the probability that individual individual i chooses occupation j, 
given their information at t, the expected ability in the distribution is calculated as, 




Si Sj Qj(h 
EiEj %(*>*) 


(45) 


Table 9 indicated that priors beliefs did not likely play a role in initial occupational decisions 
for high school graduates. As we can see in table 10 the expected match value for high school 
graduates with zero years labor market experience is essentially zero, providing further evidence 
that these high school graduates possess little information to sort on ability. However, as their 
careers’ progress, the average ability increases steadily by 0.5% to 1.0% per year such that with 
nine years labor market experience, on average, 7.5% of wages is due to sorting on ability for 
high school graduates. Given that table 9 shows that 13.2% of high school graduates are in one 
of their worst three matches, there is likely a lot of heterogeneity underlying this average ability. 
However, an average increase in wages of 7.5% is a sizable contribution to wage growth and for 
most occupations greater than the return to one year of college. 

A second important aspect of the model is to look at how individuals second best options are 
evolving over time. The correlated learning structure suggests as workers become more informed 
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about their ability in their current occupation, they will likely have more information about their 
abilities overall. Specifically, if workers are forced out of their number one occupational choice, 
they will be able to compensate for this loss using the information they have acquired. This is in 
contrast to the results generated from the independence assumption, where a job-loss results in the 
destruction of the entire 7.5% wage gains. 

To analyze if individuals’ second best choices are improving with labor market experience, we 
will look at the difference in the expected match for those who are employed, who are forced out 
of their preferred occupation and forced into an alternative occupation. These values are reported 
for high school graduates in table 10. The expected match for high school graduates conditional on 
working reported in column (b) is slightly higher at all levels of labor market experience compared to 
the population average because those who actually choose employment are likely to do so because 
they have on average a higher ability. The column (c) shows the expected match if these same 
workers were forced out of their first choice occupation into a second choice. One option for these 
workers is to exit the labor force. Column (c) reports the expect ability allowing some workers to 
exit the labor force. While the second best choice for these workers is almost always less than the 
expected match of their primary occupation, the expected ability in the second best occupations 
is able to keep pace with the growth in the expected ability for the primary occupation. Rather 
than losing the entire average 8.0% for employed high school graduates with 9 years labor market 
experience, if displaced, these workers will move into occupations with an expected ability of 5.6%, 
implying only a moderate loss of wage due to lower matched occupation of -2.4%. 

Column (d) shows the expected ability if all of these workers are forced to find new occupations 
(i.e. workers are not allowed to exit the workforce). The expected ability of this population is lower 
than the case when we allow for exits, implying that the individuals that are most likely to exit 
the labor force do so because they do not have a suitable second best option. While this number 
is lower, it is still quite large, where workers are able to recover a sizable amount of their loss in 
wages due to matching through correlated learning. 

Table 11 shows the wage gains for the other education groups. College graduates appear to 
have the highest expected ability match, contributing to 9.1% of wages at five years of labor market 
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Table 11: Wage Gains (in percent) Due to Sorting on Ability by Edu¬ 
cation 


Labor Market Experience (Yrs.) 




Less than HS 

High School 

College Grad. 

0 

-3.7% 

-0.1% 

7.1% 

1 

-1.6% 

2.4% 

8 .8% 

2 

-0.9% 

2.9% 

8.4% 

3 

-0.3% 

3.3% 

8.3% 

4 

0.4% 

3.9% 

8.5% 

5 

1 .1% 

4.4% 

9.1% 

6 

2 .0% 

5.2% 

- 

7 

3.0% 

6 .1% 

- 

8 

4.0% 

6.5% 

- 

9 

- 

7.5% 

- 

Random Match Values 

-5.6% 

0 % 

3.9% 


“ Labor market experience for high school drop outs begins at age 16, age 18 for 
high school graduates, age 19 for some college, and age 22 for college 
graduates. 

b The probability that an individual with a given level of education and labor 
market experience chooses an occupation possessing one of their worst three 
abilities. 

c The probability that an individual with a given level of education and labor 
market experience chooses an occupation possessing one of their top three 
abilities. 


experience. However this number does not truly reflect the gains of sorting. Given the positive 
correlation structure, college graduates are likely to be much better in any of the occupations. We 
would expect their average match value to be positive regardless of the decision rule they used. The 
bottom row of table 11 shows the expected match value if individuals chose occupations with equal 
probability. For high school graduates the expected ability at zero years experience is identical 
to the expected match if individuals randomly choose occupations, suggesting in some sense again 
that these workers have little information to base their initial choices on. 

College graduates on the other hand have an expected match of of 3.9% if they choose occu¬ 
pations at random, implying an average wage gain of 5.2% (9.1-3.9) due to occupational sorting. 
What is perhaps most interesting is that the majority of these gains, 3.2%, are realized before these 
workers have any labor market experience and only achieve an additional 2% on average once in the 
labor market. This means that for college graduates, priors play a very important role in sorting 
on ability. 
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High school drop-outs show a similar story. If this group where to choose occupations randomly, 
this would imply an expected occupational match of -5.6%, suggesting that these workers are in 
general, lower ability. However, given their prior information, they are able to improve their 
position in the labor market by an average of about 2.0% percentage points. Although not as large 
as college graduates, initial sorting is relevant for high school drop outs. After nine years labor 
market experience high school dropouts have an expected ability value of 4.0% implying a gain on 
average of 9.6% (4.0 + 5.6). 

The collection of results from the simulations begin to describe in detail how sorting on ability 
effects occupational choices and wage growth. In general the results show strong evidence that 
workers sort into occupations where they have higher ability as they learn through labor market 
experience. These results vary widely across educational groups, where high school drops out 
appear to the be most likely to move out of poorly match occupations. College graduates on the 
other hand are much more likely to stay in occupations where they have very low ability. The 
fact that either group would choose to stay in a poorly matched occupation highlights the role of 
other non-pecuniary factors which drive occupational decisions in addition to sorting on ability. 
The exact factors leading to the asymmetric sorting on ability by education level remains for future 
study. As high school drop outs are the most likely to sort on ability they are able to increase 
wages by 9.5% with 9 years labor market experience. High school graduates and college graduates 
increase wages 7.5% and 5.2% respectively. 

7 Conclusion 

This paper develops and estimates an individual model of occupational choice and learning that 
allows for correlated learning across occupation specific abilities. Flexibly allowing information 
to enter the worker’s mobility decision in this manner broadens the type of career moves that 
occupational matching models can address. Specifically, the model is able to capture both lateral 
occupational moves and vertical occupation moves, where high ability individuals change careers 
due to promotion. More generally, the model not only addresses what drives individuals to change 
careers, but also how information effects their decision of what new occupation to go to. 
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Eudogenizing information in this way significantly increases the computational burden. I ad¬ 
dress these empirical challenges by utilizing the Expectation and Maximization (EM) algorithm to 
uncover the persist, high dimensional unobservables in the data. This approach is particularly well 
suited for the learning framework as it breaks the curse of dimensionality so the computational 
complexity only grows linearly in the number of occupations. The empirical strategy likely has 
broad applicability to other learning models. 

The model is estimated on the National Longitudinal Survey of Youth 1997. The parameter 
estimates suggest that workers can potentially realize large wage gains by finding occupations they 
are well suited for. The model allows for heterogenous priors over the distribution of ability. These 
priors are strongly related to educational attainment, which plays an important role in driving 
initial occupational choices. Using the parameter estimates I simulate careers to gain insight into 
the importance of sorting on ability in occupational decisions and it’s effect on wage growth. 
The simulations suggest that workers willingness to sort on ability is highly related to educational 
attainment, where high school drop-outs are most likely to choose occupations based on ability, and 
college graduates are the least likely to sort on ability. This finding provides some understanding as 
to why college graduates are observed in the data to be less likely to change occupations compared 
to other cohorts. 
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A Mispecification Bias: Aggregating Occupations 


This section discusses the mispecification bias in the ability matching model if occupations are 
coarsely aggregated into blue collar and white collar as is typically done in the occupational choice 
literature. When occupations are aggregated, this assumes that the ability vectors are perfectly 
correlated. To the degree that this is not true, many of the structural parameters will be biased. 
For example, the returns to tenure will be overstated. If workers are actually searching among 
many occupations, which the econometrician assumes is only one occupation, then the returns to 
search will be perceived as returns to work experience. In a similar way, the average match quality 
will be upward bias. Since people are selecting their highest match among multiple occupations, 
which the econometrician assumes are one. 

Less obvious will be it’s effect on the variance parameters of the model. Inappropriately aggre¬ 
gating occupations will downward bias the variance on occupational match quality. The estimator 
will compensate for the downward bias in the ability variance by upwardly biasing the variance of 
the technology shock. This bias can be demonstrated with a simple stylized model. 

Assume their are two occupations (1,2) that the econometrician aggregates into a single occu¬ 
pation. Each worker observes one ability signal from each occupation parameterized as follows for 
algebraic ease. 


mi = Hn +en, mi ~ N(0, o 2 ),£a ~ N{ 0,<r 2 ) 

Wi2 = Hi2 + Si2, Hi2 ~ N{0, cr 2 ), £ i2 ~ N( 0, a 2 ) 
and E{nalH 2 ) = P 

If the econometrician assumes both wage signals are from a single match //,-, the posterior beliefs 
will be. 

Ei(jj-i) = ((cr 2 ) 1 + (cr 2 )- 1 ) 1 ((fr 2 ) -1 0 + (<7 2 ) -1 (u;ii + w i2 )) 

_ wg+_ w i2 
3 

Vi(iM) = ((cr 2 ) -1 + (cr 2 ) -1 ) -1 

“ 3 
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Given these values for the population, we can derive the population variance of match quality 


V(p) =E{tf) 

=E [ViifH) + Eiifiif] 
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= y~ + Y < 0-2 if P < cr 2 or CORR(nn, 2 ) / 1 


Turning to the bias on the technology shock, since the total variance of wages is V(w) = 2 a, 
subtracting the estimate for V{e) is, 

V(e) = 2a 2 - ^ ^ > a 2 if CORR(mu m) + 1 

Given the high value of understanding sources of wage variation, the distribution of abilities and 
the selection problems stated earlier, these sorts of biases are extremely undesirable. Therefore, it 
is important that we push the empirical model to accommodate as many occupations as possible. 
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